You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by Robert Oxspring <ro...@imapmail.org> on 2020/04/24 22:58:53 UTC

Maven Filtering Plugin should avoid overwriting even when filtering

Hi all,


When copying resources with filtering on, files are always overwritten even when the filters have not changed. I’d like to change this such that repeated filtering copies do not modify the destination file.

I’ve prepared a change to write the filtered content to a temporary file and only rename that over the destination if they’re different:
https://github.com/apache/maven-filtering/compare/master...roxspring:feature/avoid-overwrite-on-no-op-filter

Would something like this be acceptable? I’m guessing the next step is to create an issue in Jira? - agains MNG??

Thanks,

Rob
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Robert Scholte <rf...@apache.org>.
The main different is the moment you know you have to write the file.
If there's zero interpolation/filtering, then there's no need to write the file.
As soon as you detect interpolation/filtering you can start overwriting the original file.

The challenge is probably to fit this concept in the current codebase.
However,the result would a much more elegant and efficient result compared to the temporary created files.

Robert
 
On 30-4-2020 09:54:55, Rob Oxspring <ro...@imapmail.org> wrote:

> On 30 Apr 2020, at 07:38, Robert Scholte wrote:
>
> I prefer to see an in memory solution.

Well if it’s reasonable to assume that filtered files are always small then we could use replace the temporary file in my solution with an in memory buffer... but I’m not sure that’s what you’re shooting at?

> Key should be to detect if filtering is applied, which is done in the MultiDelimiterInterpolatorFilterReaderLineEnding[1]
> Once a value has been interpolated, you must rewrite the file, otherwise you shouldn't.

Again though, this appears to miss the subtlety: “if filtering is applied” is insufficient, the condition needs to be “if filtering is applied with different results than the previous run”. This requires either attempting to store some state between runs.

We could scan the source file for filtered values and store just that state (or checksum) in a file between runs. The cost would be an extra read of the source file + state comparison + writing out the state of each filtering. Is this what you’re thinking?

The alternative is to just use the target file as the (not minimal) state. We could read and filter the source file once, while reading and comparing with the target file in parallel. As soon as the contents start to differ then truncate the target file and append the rest of the filtered source to it. The cost here would be an extra full read of the target. Is this what you’re after?

Otherwise I’m at a loss to understand what would be acceptable.

Thanks!

Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Rob Oxspring <ro...@imapmail.org>.
> On 30 Apr 2020, at 07:38, Robert Scholte <rf...@apache.org> wrote:
> 
> I prefer to see an in memory solution.

Well if it’s reasonable to assume that filtered files are always small then we could use replace the temporary file in my solution with an in memory buffer... but I’m not sure that’s what you’re shooting at?

> Key should be to detect if filtering is applied, which is done in the MultiDelimiterInterpolatorFilterReaderLineEnding[1]
> Once a value has been interpolated, you must rewrite the file, otherwise you shouldn't.

Again though, this appears to miss the subtlety: “if filtering is applied” is insufficient, the condition needs to be “if filtering is applied with different results than the previous run”. This requires either attempting to store some state between runs. 

We could scan the source file for filtered values and store just that state (or checksum) in a file between runs. The cost would be an extra read of the source file + state comparison + writing out the state of each filtering. Is this what you’re thinking?

The alternative is to just use the target file as the (not minimal) state. We could read and filter the source file once, while reading and comparing with the target file in parallel. As soon as the contents start to differ then truncate the target file and append the rest of the filtered source to it. The cost here would be an extra full read of the target. Is this what you’re after?

Otherwise I’m at a loss to understand what would be acceptable. 

Thanks!

Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Robert Scholte <rf...@apache.org>.
I prefer to see an in memory solution. Key should be to detect if filtering is applied, which is done in the MultiDelimiterInterpolatorFilterReaderLineEnding[1]
Once a value has been interpolated, you must rewrite the file, otherwise you shouldn't.
I hope this is a push in the right direction for a more efficient solution.

Robert


[1] https://github.com/apache/maven-filtering/blob/master/src/main/java/org/apache/maven/shared/filtering/MultiDelimiterInterpolatorFilterReaderLineEnding.java#L414-L423
On 29-4-2020 23:43:19, Robert Oxspring <ro...@imapmail.org> wrote:


> On 29 Apr 2020, at 14:43, Tobias wrote:
>
> Hi,
>
> I just came across this issue the other day, too. A mvn install after a
> mvn clean install, i.e. nothing has to be compiled, results in a jar
> being created for one module which takes one minute. This is due to
> filtered resources being overwriten though this is not necessary. This
> happens for multiples modules in our build. I too think that the speed
> of the build process could be greatly improved if resources wouldn't be
> overwriten if nothing has changed.
>
>> I think we’re doomed to having to recreate the file each time, because we have no way of knowing whether the file has changed until you generate it.
>
> I'm new to Maven and just discovering it, but my sketch of a naive and
> simple, not optimized idea was the following:
>
> - Identify all locations where information can be stored which is used
> to generate filtered resources (pom.xml, .properties files, ...?)
> - For all these locations, get the date it was last written to
> - Take the most recent date from all these dates
> - For each filtered resource which is already in the target folder from
> the last build, take the date of this resource file and check if this
> date is newer than the date identified in the previouis step. If it is,
> don't copy the filtered resource to the target folder
>
> Is this not possible?

If we need to look at all potential sources of filterable properties then that will include environment variables which aren’t necessarily associated with a file, and potentially properties that include a timestamp and will necessarily be different every run, so using the lastModified date of some set of input files is not practical.

I’ve opened a pull request using a temporary file to write filtered content to, and compare with the previous result. If the contents are equal then the temporary file is deleted and the target file is left unmodified, otherwise we rename the temporary file over the target, updating its lastModified date implicitly. This approach requires no knowledge of what filtering was performed, or where those properties came from.

Admittedly it means that the Maven Filtering Plugin is still doing a little more IO work each run, and it temporarily consumes 2x the storage in the target volume, but filtered resources tend to be small and disk space is relatively cheap so I think this is acceptable. This approach does recreate _a_ file each time, but avoids recreating _the_ file each time, and can avoid triggering downstream work based on the lastModified date.

I think this is the simplest and least risk solution, and doesn’t preclude trying something fancier later.

https://github.com/apache/maven-filtering/pull/5

Feedback welcome!

Thanks,

Rob



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Robert Oxspring <ro...@imapmail.org>.

> On 29 Apr 2020, at 14:43, Tobias <to...@freenet.de> wrote:
> 
> Hi,
> 
> I just came across this issue the other day, too. A mvn install after a
> mvn clean install, i.e. nothing has to be compiled, results in a jar
> being created for one module which takes one minute. This is due to
> filtered resources being overwriten though this is not necessary. This
> happens for multiples modules in our build. I too think that the speed
> of the build process could be greatly improved if resources wouldn't be
> overwriten if nothing has changed.
> 
>> I think we’re doomed to having to recreate the file each time, because we have no way of knowing whether the file has changed until you generate it.
> 
> I'm new to Maven and just discovering it, but my sketch of a naive and
> simple, not optimized idea was the following:
> 
> - Identify all locations where information can be stored which is used
> to generate filtered resources (pom.xml, .properties files, ...?)
> - For all these locations, get the date it was last written to
> - Take the most recent date from all these dates
> - For each filtered resource which is already in the target folder from
> the last build, take the date of this resource file and check if this
> date is newer than the date identified in the previouis step. If it is,
> don't copy the filtered resource to the target folder
> 
> Is this not possible?

If we need to look at all potential sources of filterable properties then that will include environment variables which aren’t necessarily associated with a file, and potentially properties that include a timestamp and will necessarily be different every run, so using the lastModified date of some set of input files is not practical.

I’ve opened a pull request using a temporary file to write filtered content to, and compare with the previous result. If the contents are equal then the temporary file is deleted and the target file is left unmodified, otherwise we rename the temporary file over the target, updating its lastModified date implicitly. This approach requires no knowledge of what filtering was performed, or where those properties came from.

Admittedly it means that the Maven Filtering Plugin is still doing a little more IO work each run, and it temporarily consumes 2x the storage in the target volume, but filtered resources tend to be small and disk space is relatively cheap so I think this is acceptable. This approach does recreate _a_ file each time, but avoids recreating _the_ file each time, and can avoid triggering downstream work based on the lastModified date.

I think this is the simplest and least risk solution, and doesn’t preclude trying something fancier later. 

https://github.com/apache/maven-filtering/pull/5

Feedback welcome!

Thanks,

Rob



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Tobias <to...@freenet.de>.
Hi,

I just came across this issue the other day, too. A mvn install after a
mvn clean install, i.e. nothing has to be compiled, results in a jar
being created for one module which takes one minute. This is due to
filtered resources being overwriten though this is not necessary. This
happens for multiples modules in our build. I too think that the speed
of the build process could be greatly improved if resources wouldn't be
overwriten if nothing has changed.

> I think we’re doomed to having to recreate the file each time, because we have no way of knowing whether the file has changed until you generate it.

I'm new to Maven and just discovering it, but my sketch of a naive and
simple, not optimized idea was the following:

- Identify all locations where information can be stored which is used
to generate filtered resources (pom.xml, .properties files, ...?)
- For all these locations, get the date it was last written to
- Take the most recent date from all these dates
- For each filtered resource which is already in the target folder from
the last build, take the date of this resource file and check if this
date is newer than the date identified in the previouis step. If it is,
don't copy the filtered resource to the target folder

Is this not possible?

I'd also like to add that I found a similar thread when searching for
this issue:
https://users.maven.apache.narkive.com/FAqDGEfS/avoiding-overwriting-newer-file-when-using-maven-resources-plugin-copy-resources-with-filtering

Best regards
Tobias



On 2020/04/25 13:58:15, Karl Heinz Marbaise <k....@gmx.de> wrote:
> Hi,>
>
> On 25.04.20 00:58, Robert Oxspring wrote:>
> > Hi all,>
> >>
> >>
> > When copying resources with filtering on, files are always
overwritten even when the filters have not changed. I’d like to change
this such that repeated filtering copies do not modify the destination
file.>
> >>
> > I’ve prepared a change to write the filtered content to a temporary
file and only rename that over the destination if they’re different:>
> >
https://github.com/apache/maven-filtering/compare/master...roxspring:feature/avoid-overwrite-on-no-op-filter>

> >>
> > Would something like this be acceptable? I’m guessing the next step
is to create an issue in Jira? - agains MNG??>
>
> There is a JIRA link on the github repository ...creating a JIRA is a>
> prerequisites..>
>
> Maven Filtering is a component which is used by>
> maven-resources-plugin...If you like to make that effort would be great>
> and see if all tests working fine ...and maybe we will merge that...>
>
> Kind regards>
> Karl Heinz Marbaise>
>
>
> >>
> > Thanks,>
> >>
> > Rob>
> > --------------------------------------------------------------------->
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org>
> > For additional commands, e-mail: dev-help@maven.apache.org>
> >>
>
> --------------------------------------------------------------------->
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org>
> For additional commands, e-mail: dev-help@maven.apache.org>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Karl Heinz Marbaise <kh...@gmx.de>.
Hi,

On 25.04.20 00:58, Robert Oxspring wrote:
> Hi all,
>
>
> When copying resources with filtering on, files are always overwritten even when the filters have not changed. I’d like to change this such that repeated filtering copies do not modify the destination file.
>
> I’ve prepared a change to write the filtered content to a temporary file and only rename that over the destination if they’re different:
> https://github.com/apache/maven-filtering/compare/master...roxspring:feature/avoid-overwrite-on-no-op-filter
>
> Would something like this be acceptable? I’m guessing the next step is to create an issue in Jira? - agains MNG??

There is a JIRA link on the github repository ...creating a JIRA is a
prerequisites..

Maven Filtering is a component which is used by
maven-resources-plugin...If you like to make that effort would be great
and see if all tests working fine ...and maybe we will merge that...

Kind regards
Karl Heinz Marbaise


>
> Thanks,
>
> Rob
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Graham Leggett <mi...@sharp.fm>.
On 25 Apr 2020, at 17:12, Robert Oxspring <ro...@imapmail.org> wrote:

>> I tend to think it does not need any hash but just a lastUpdated
>> track (the most recent file invalidating previous cache, it is enough and
>> faster than any hash computation)
> 
> I’m not totally sure of this. Filtering, for example, is particularly dependant on properties and configuration which aren’t directly linked with a source file change.

Exactly - this is the key limitation.

I think we’re doomed to having to recreate the file each time, because we have no way of knowing whether the file has changed until you generate it.

But - there is nothing stopping us, when we detect the file is the same, from not updating the original file. This will have the effect of not triggering unnecessary downstream changes.

Concrete example: A filter generates a file that is input to a code generator. Because the file changes every time, the code generator runs every time, not good. But if we detected the file as the same, and didn’t write the filtered file, the code generator wouldn’t run, and we would safely avoid an unnecessary rebuild.

Regards,
Graham
—


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Le sam. 25 avr. 2020 à 18:38, Robert Oxspring <ro...@imapmail.org> a
écrit :

>
> > On 25 Apr 2020, at 16:31, Romain Manni-Bucau <rm...@gmail.com>
> wrote:
> >
> > Le sam. 25 avr. 2020 à 17:12, Robert Oxspring <ro...@imapmail.org>
> a
> > écrit :
> >
> >>
> >>> On 25 Apr 2020, at 15:37, Romain Manni-Bucau <rm...@gmail.com>
> >> wrote:
> >>>
> >>> Hi Robert, two thoughts maybe
> >>>
> >>> 1. all that work (thinking also of assembly one) should probably end up
> >> by
> >>> enriching
> >>>
> >>
> https://github.com/apache/maven-shared-incremental/blob/master/src/main/java/org/apache/maven/shared/incremental/IncrementalBuildHelper.java
> >>> model
> >>> and impl.
> >>
> >> Great! It’s been a couple of years since I’ve worked on a maven project
> so
> >> pointers towards infrastructure pieces I’ve missed or forgotten is v
> >> helpful! I’ll take a look at the incremental build helper.
> >>
> >>> I tend to think it does not need any hash but just a lastUpdated
> >>> track (the most recent file invalidating previous cache, it is enough
> and
> >>> faster than any hash computation)
> >>
> >> I’m not totally sure of this. Filtering, for example, is particularly
> >> dependant on properties and configuration which aren’t directly linked
> with
> >> a source file change. Allowing the feature to consider wider inputs
> should
> >> make the feature more reliable, and might avoid the need to make it
> >> optional. Definitely up for running the numbers and seeing the impact
> >> before deciding though!
> >>
> >
> > Hmm, more you'll put there, more you'll defeat your goal (didn't check on
> > maven but did it for some "big" downloads where the size can be compared
> on
> > maven project and hashing was a disaster if done globally).
> > Globally if you depend on properties then you have to use more advanced
> > hashing but also just rebuild generally so timestamp still work, you just
> > force the rebuild in such a case.
> > There is also the issue of plugin adding properties and downstream
> plugins
> > using them, here you would need a state manager you reinject in
> > maven...which will highly likely be wrong at the rebuild time.
> > So maybe let's start simple?
>
> This does take us somewhat full circle: The diff I linked to at the start
> of this thread was a far simpler place to start. No state storage. Files
> only modified if the content changes. Tbh, with this change in place I
> suspect I’d be happy with lastModified driven incremental builds.
>
> >>> 2. by default nothing should be skipped - I guess a
> >> -Dxxx.incremental=true
> >>> should be added, maybe even to maven core (see next point), all this
> >> theory
> >>> is based on fully defining inputs/outputs fully. This is what gradle
> >>> implemented and I almost never saw a single complex gradle build
> >> correctly
> >>> configured to not bypass modules it shouldn't bypass (or worse, skip
> >> tests
> >>> it shouldn't) so I think we should be conservative here + several
> plugins
> >>> refetch data when executed so even if the result of 1 is "no change"
> you
> >>> can need to reexecute the plugin (including the filtering one if the
> >>> filtered data depends on the time or so - I don't want to discuss it
> is a
> >>> good or bad practise but it is used a lot in CI/CD pipelines).
> >>
> >> Interesting. To me the the default should be that nothing is repeated.
> Our
> >> experiences of Gradle are entirely opposed too - I’ve never seen it’s
> >> incremental build support doing the wrong thing. I’m well aware of the
> >> necessity to support timestamps when filtering though.
> >>
> >
> > At beginning i was wondering how Gradle was so fast...then i realized it
> > was not doing its job. I didn't review enough projects probably (some
> > dozen) but not a single one was well configured to ensure the build was
> > deterministic until you kill the daemon and clean the project before the
> > rebuild. It is very nasty and I hope maven does not get that kind of
> > behavior *by default* (to explicit that '*': it is fine to me to get it
> in
> > dev since on some projects it can help in dev iterations).
> >
> >
> >>
> >>> 3. Not being per plugin but global on maven sounds better than starting
> >> to
> >>> be specific in all plugins. I really see it as a dev feature you can
> >>> activate while working and not enable on the CI for example. Added to
> >>> maven-core it can enable to define in mojo the preconditions which
> would
> >> be
> >>> checked with 1 and potentially the conditions triggering
> inconditionally
> >> a
> >>> rebuild. Maybe a @MojoInput("${project.build.outputDirectory}",
> >>> "${project.artifacts}", "method:getState()") or so.
> >>
> >> Yeah, that’s all looking much closer to what I’ve used in Gradle more
> >> recently. It would make it a bigger piece of work, but I’m happy to
> explore
> >> that if it’s the preferred option!
> >>
> >
> > BTW, we should have started by that, but do you have figures about your
> > project?
> > Something like "each mojo taks X ms", i modify a class, estimated rebuild
> > time is "Y ms" (maybe call the mojos manually, like "mvn compiler:compile
> > jar:jar" etc), actual rebuild time with "mvn install" is "Z ms" (+
> > drilldown per mojo).
>
> Does seem a sensible to start. Is there a reasonable way to extract such
> timing information? Was hoping that grepping A debut log would be
> sufficient but no timings are obvious beyond the overall summary. Figuring
> out the relevant mojos and manually timing for 30 modules Is pretty boring.
>

Guess you can implement a log parser quickly and extract it from there
enabling time logging or just grab a profiler maybe?


>
> >
> > Don't know if Hervé read this already but can't it be close to the
> > reproducible track which has a kind of state - to make more generic
> > probably but it sounds like a "standard"/existing option to investigate
> > maybe?
>
> Didn’t follow any of that but seems it was mostly for Hervé.
>

Yep, overall idea is reproducible builds require to be able to check a
previous build state. For now it is mainly the output but I suspect the
input is not that far so I just wonder if we should target to converge at
some point or not. Really an open question ATM.


>
> >
> >
> >>
> >> Thanks for the input!
> >>
> >> Rob
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >> For additional commands, e-mail: dev-help@maven.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Robert Oxspring <ro...@imapmail.org>.
> On 25 Apr 2020, at 16:31, Romain Manni-Bucau <rm...@gmail.com> wrote:
> 
> Le sam. 25 avr. 2020 à 17:12, Robert Oxspring <ro...@imapmail.org> a
> écrit :
> 
>> 
>>> On 25 Apr 2020, at 15:37, Romain Manni-Bucau <rm...@gmail.com>
>> wrote:
>>> 
>>> Hi Robert, two thoughts maybe
>>> 
>>> 1. all that work (thinking also of assembly one) should probably end up
>> by
>>> enriching
>>> 
>> https://github.com/apache/maven-shared-incremental/blob/master/src/main/java/org/apache/maven/shared/incremental/IncrementalBuildHelper.java
>>> model
>>> and impl.
>> 
>> Great! It’s been a couple of years since I’ve worked on a maven project so
>> pointers towards infrastructure pieces I’ve missed or forgotten is v
>> helpful! I’ll take a look at the incremental build helper.
>> 
>>> I tend to think it does not need any hash but just a lastUpdated
>>> track (the most recent file invalidating previous cache, it is enough and
>>> faster than any hash computation)
>> 
>> I’m not totally sure of this. Filtering, for example, is particularly
>> dependant on properties and configuration which aren’t directly linked with
>> a source file change. Allowing the feature to consider wider inputs should
>> make the feature more reliable, and might avoid the need to make it
>> optional. Definitely up for running the numbers and seeing the impact
>> before deciding though!
>> 
> 
> Hmm, more you'll put there, more you'll defeat your goal (didn't check on
> maven but did it for some "big" downloads where the size can be compared on
> maven project and hashing was a disaster if done globally).
> Globally if you depend on properties then you have to use more advanced
> hashing but also just rebuild generally so timestamp still work, you just
> force the rebuild in such a case.
> There is also the issue of plugin adding properties and downstream plugins
> using them, here you would need a state manager you reinject in
> maven...which will highly likely be wrong at the rebuild time.
> So maybe let's start simple?

This does take us somewhat full circle: The diff I linked to at the start of this thread was a far simpler place to start. No state storage. Files only modified if the content changes. Tbh, with this change in place I suspect I’d be happy with lastModified driven incremental builds. 

>>> 2. by default nothing should be skipped - I guess a
>> -Dxxx.incremental=true
>>> should be added, maybe even to maven core (see next point), all this
>> theory
>>> is based on fully defining inputs/outputs fully. This is what gradle
>>> implemented and I almost never saw a single complex gradle build
>> correctly
>>> configured to not bypass modules it shouldn't bypass (or worse, skip
>> tests
>>> it shouldn't) so I think we should be conservative here + several plugins
>>> refetch data when executed so even if the result of 1 is "no change" you
>>> can need to reexecute the plugin (including the filtering one if the
>>> filtered data depends on the time or so - I don't want to discuss it is a
>>> good or bad practise but it is used a lot in CI/CD pipelines).
>> 
>> Interesting. To me the the default should be that nothing is repeated. Our
>> experiences of Gradle are entirely opposed too - I’ve never seen it’s
>> incremental build support doing the wrong thing. I’m well aware of the
>> necessity to support timestamps when filtering though.
>> 
> 
> At beginning i was wondering how Gradle was so fast...then i realized it
> was not doing its job. I didn't review enough projects probably (some
> dozen) but not a single one was well configured to ensure the build was
> deterministic until you kill the daemon and clean the project before the
> rebuild. It is very nasty and I hope maven does not get that kind of
> behavior *by default* (to explicit that '*': it is fine to me to get it in
> dev since on some projects it can help in dev iterations).
> 
> 
>> 
>>> 3. Not being per plugin but global on maven sounds better than starting
>> to
>>> be specific in all plugins. I really see it as a dev feature you can
>>> activate while working and not enable on the CI for example. Added to
>>> maven-core it can enable to define in mojo the preconditions which would
>> be
>>> checked with 1 and potentially the conditions triggering inconditionally
>> a
>>> rebuild. Maybe a @MojoInput("${project.build.outputDirectory}",
>>> "${project.artifacts}", "method:getState()") or so.
>> 
>> Yeah, that’s all looking much closer to what I’ve used in Gradle more
>> recently. It would make it a bigger piece of work, but I’m happy to explore
>> that if it’s the preferred option!
>> 
> 
> BTW, we should have started by that, but do you have figures about your
> project?
> Something like "each mojo taks X ms", i modify a class, estimated rebuild
> time is "Y ms" (maybe call the mojos manually, like "mvn compiler:compile
> jar:jar" etc), actual rebuild time with "mvn install" is "Z ms" (+
> drilldown per mojo).

Does seem a sensible to start. Is there a reasonable way to extract such timing information? Was hoping that grepping A debut log would be sufficient but no timings are obvious beyond the overall summary. Figuring out the relevant mojos and manually timing for 30 modules Is pretty boring. 

> 
> Don't know if Hervé read this already but can't it be close to the
> reproducible track which has a kind of state - to make more generic
> probably but it sounds like a "standard"/existing option to investigate
> maybe?

Didn’t follow any of that but seems it was mostly for Hervé. 

> 
> 
>> 
>> Thanks for the input!
>> 
>> Rob
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Le sam. 25 avr. 2020 à 17:12, Robert Oxspring <ro...@imapmail.org> a
écrit :

>
> > On 25 Apr 2020, at 15:37, Romain Manni-Bucau <rm...@gmail.com>
> wrote:
> >
> > Hi Robert, two thoughts maybe
> >
> > 1. all that work (thinking also of assembly one) should probably end up
> by
> > enriching
> >
> https://github.com/apache/maven-shared-incremental/blob/master/src/main/java/org/apache/maven/shared/incremental/IncrementalBuildHelper.java
> > model
> > and impl.
>
> Great! It’s been a couple of years since I’ve worked on a maven project so
> pointers towards infrastructure pieces I’ve missed or forgotten is v
> helpful! I’ll take a look at the incremental build helper.
>
> > I tend to think it does not need any hash but just a lastUpdated
> > track (the most recent file invalidating previous cache, it is enough and
> > faster than any hash computation)
>
> I’m not totally sure of this. Filtering, for example, is particularly
> dependant on properties and configuration which aren’t directly linked with
> a source file change. Allowing the feature to consider wider inputs should
> make the feature more reliable, and might avoid the need to make it
> optional. Definitely up for running the numbers and seeing the impact
> before deciding though!
>

Hmm, more you'll put there, more you'll defeat your goal (didn't check on
maven but did it for some "big" downloads where the size can be compared on
maven project and hashing was a disaster if done globally).
Globally if you depend on properties then you have to use more advanced
hashing but also just rebuild generally so timestamp still work, you just
force the rebuild in such a case.
There is also the issue of plugin adding properties and downstream plugins
using them, here you would need a state manager you reinject in
maven...which will highly likely be wrong at the rebuild time.
So maybe let's start simple?


>
> > 2. by default nothing should be skipped - I guess a
> -Dxxx.incremental=true
> > should be added, maybe even to maven core (see next point), all this
> theory
> > is based on fully defining inputs/outputs fully. This is what gradle
> > implemented and I almost never saw a single complex gradle build
> correctly
> > configured to not bypass modules it shouldn't bypass (or worse, skip
> tests
> > it shouldn't) so I think we should be conservative here + several plugins
> > refetch data when executed so even if the result of 1 is "no change" you
> > can need to reexecute the plugin (including the filtering one if the
> > filtered data depends on the time or so - I don't want to discuss it is a
> > good or bad practise but it is used a lot in CI/CD pipelines).
>
> Interesting. To me the the default should be that nothing is repeated. Our
> experiences of Gradle are entirely opposed too - I’ve never seen it’s
> incremental build support doing the wrong thing. I’m well aware of the
> necessity to support timestamps when filtering though.
>

At beginning i was wondering how Gradle was so fast...then i realized it
was not doing its job. I didn't review enough projects probably (some
dozen) but not a single one was well configured to ensure the build was
deterministic until you kill the daemon and clean the project before the
rebuild. It is very nasty and I hope maven does not get that kind of
behavior *by default* (to explicit that '*': it is fine to me to get it in
dev since on some projects it can help in dev iterations).


>
> > 3. Not being per plugin but global on maven sounds better than starting
> to
> > be specific in all plugins. I really see it as a dev feature you can
> > activate while working and not enable on the CI for example. Added to
> > maven-core it can enable to define in mojo the preconditions which would
> be
> > checked with 1 and potentially the conditions triggering inconditionally
> a
> > rebuild. Maybe a @MojoInput("${project.build.outputDirectory}",
> > "${project.artifacts}", "method:getState()") or so.
>
> Yeah, that’s all looking much closer to what I’ve used in Gradle more
> recently. It would make it a bigger piece of work, but I’m happy to explore
> that if it’s the preferred option!
>

BTW, we should have started by that, but do you have figures about your
project?
Something like "each mojo taks X ms", i modify a class, estimated rebuild
time is "Y ms" (maybe call the mojos manually, like "mvn compiler:compile
jar:jar" etc), actual rebuild time with "mvn install" is "Z ms" (+
drilldown per mojo).

Don't know if Hervé read this already but can't it be close to the
reproducible track which has a kind of state - to make more generic
probably but it sounds like a "standard"/existing option to investigate
maybe?


>
> Thanks for the input!
>
> Rob
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Robert Oxspring <ro...@imapmail.org>.
> On 25 Apr 2020, at 15:37, Romain Manni-Bucau <rm...@gmail.com> wrote:
> 
> Hi Robert, two thoughts maybe
> 
> 1. all that work (thinking also of assembly one) should probably end up by
> enriching
> https://github.com/apache/maven-shared-incremental/blob/master/src/main/java/org/apache/maven/shared/incremental/IncrementalBuildHelper.java
> model
> and impl.

Great! It’s been a couple of years since I’ve worked on a maven project so pointers towards infrastructure pieces I’ve missed or forgotten is v helpful! I’ll take a look at the incremental build helper. 

> I tend to think it does not need any hash but just a lastUpdated
> track (the most recent file invalidating previous cache, it is enough and
> faster than any hash computation)

I’m not totally sure of this. Filtering, for example, is particularly dependant on properties and configuration which aren’t directly linked with a source file change. Allowing the feature to consider wider inputs should make the feature more reliable, and might avoid the need to make it optional. Definitely up for running the numbers and seeing the impact before deciding though!

> 2. by default nothing should be skipped - I guess a -Dxxx.incremental=true
> should be added, maybe even to maven core (see next point), all this theory
> is based on fully defining inputs/outputs fully. This is what gradle
> implemented and I almost never saw a single complex gradle build correctly
> configured to not bypass modules it shouldn't bypass (or worse, skip tests
> it shouldn't) so I think we should be conservative here + several plugins
> refetch data when executed so even if the result of 1 is "no change" you
> can need to reexecute the plugin (including the filtering one if the
> filtered data depends on the time or so - I don't want to discuss it is a
> good or bad practise but it is used a lot in CI/CD pipelines).

Interesting. To me the the default should be that nothing is repeated. Our experiences of Gradle are entirely opposed too - I’ve never seen it’s incremental build support doing the wrong thing. I’m well aware of the necessity to support timestamps when filtering though. 

> 3. Not being per plugin but global on maven sounds better than starting to
> be specific in all plugins. I really see it as a dev feature you can
> activate while working and not enable on the CI for example. Added to
> maven-core it can enable to define in mojo the preconditions which would be
> checked with 1 and potentially the conditions triggering inconditionally a
> rebuild. Maybe a @MojoInput("${project.build.outputDirectory}",
> "${project.artifacts}", "method:getState()") or so.

Yeah, that’s all looking much closer to what I’ve used in Gradle more recently. It would make it a bigger piece of work, but I’m happy to explore that if it’s the preferred option!

Thanks for the input!

Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hi Robert, two thoughts maybe

1. all that work (thinking also of assembly one) should probably end up by
enriching
https://github.com/apache/maven-shared-incremental/blob/master/src/main/java/org/apache/maven/shared/incremental/IncrementalBuildHelper.java
model
and impl. I tend to think it does not need any hash but just a lastUpdated
track (the most recent file invalidating previous cache, it is enough and
faster than any hash computation)
2. by default nothing should be skipped - I guess a -Dxxx.incremental=true
should be added, maybe even to maven core (see next point), all this theory
is based on fully defining inputs/outputs fully. This is what gradle
implemented and I almost never saw a single complex gradle build correctly
configured to not bypass modules it shouldn't bypass (or worse, skip tests
it shouldn't) so I think we should be conservative here + several plugins
refetch data when executed so even if the result of 1 is "no change" you
can need to reexecute the plugin (including the filtering one if the
filtered data depends on the time or so - I don't want to discuss it is a
good or bad practise but it is used a lot in CI/CD pipelines).
3. Not being per plugin but global on maven sounds better than starting to
be specific in all plugins. I really see it as a dev feature you can
activate while working and not enable on the CI for example. Added to
maven-core it can enable to define in mojo the preconditions which would be
checked with 1 and potentially the conditions triggering inconditionally a
rebuild. Maybe a @MojoInput("${project.build.outputDirectory}",
"${project.artifacts}", "method:getState()") or so.

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>


Le sam. 25 avr. 2020 à 16:25, Robert Oxspring <ro...@imapmail.org> a
écrit :

> Hi
>
> “mvn clean install” is definitely normal for small projects like maven
> plugins because it’s a cheap way of guaranteeing your build is up to date.
>
> For developers working on larger projects “mvn clean install” is typically
> avoided wherever possible as it forces everything to be rebuilt even when
> just a couple of lines of code have been changed. Instead folks rely on
> “mvn install” doing the least work practical to incrementally bring the
> target directory up to date.
>
> In complex multi-module builds its common for one modules filtered
> resources to be treated as inputs to later processes. At the moment, the
> filtering plug-in unconditionally overwrites resources such that the
> filtered output file is newly modified. This means that downstream
> processes see their inputs as newly modified and have to assume their
> outputs are outdated and need to be rebuilt.
>
> (Of course, in CI environments the opposite tradeoff is normal: accept
> slower builds in return for guaranteed clean reliable ones!)
>
> At work, a “mvn install -DskipTests” on our multi-module project takes
> minutes even when nothing has changed, when I think it could be finished in
> seconds. So far I’ve identified the filtering and assembly plugins as
> triggers of unnecessary work and am trying to offer fixes to optimise that
> behaviour!
>
> That context help at all?
>
> Thanks,
>
> Rob
>
> > On 25 Apr 2020, at 14:37, Slawomir Jaranowski <s....@gmail.com>
> wrote:
> >
> > Hi,
> >
> > Can you describe your case and what you want to achieve.
> >
> > By default all files created during maven running are write to target
> > directory. And in most case target directory is cleaned before new build
> > starting.
> > Usual maven is running  by:
> >  mvn clean install.
> >
> > sob., 25 kwi 2020 o 00:59 Robert Oxspring <ro...@imapmail.org>
> > napisał(a):
> >
> >> Hi all,
> >>
> >>
> >> When copying resources with filtering on, files are always overwritten
> >> even when the filters have not changed. I’d like to change this such
> that
> >> repeated filtering copies do not modify the destination file.
> >>
> >> I’ve prepared a change to write the filtered content to a temporary file
> >> and only rename that over the destination if they’re different:
> >>
> >>
> https://github.com/apache/maven-filtering/compare/master...roxspring:feature/avoid-overwrite-on-no-op-filter
> >>
> >> Would something like this be acceptable? I’m guessing the next step is
> to
> >> create an issue in Jira? - agains MNG??
> >>
> >> Thanks,
> >>
> >> Rob
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >> For additional commands, e-mail: dev-help@maven.apache.org
> >>
> >>
> >
> > --
> > Sławomir Jaranowski
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Karl Heinz Marbaise <kh...@gmx.de>.
Hi,

On 25.04.20 16:25, Robert Oxspring wrote:
> Hi
>
> “mvn clean install” is definitely normal for small projects like maven plugins because it’s a cheap way of guaranteeing your build is up to date.

What does install guarantee which verify does not?

>
> For developers working on larger projects “mvn clean install” is typically avoided wherever possible as it forces everything to be rebuilt even when just a couple of lines of code have been changed. Instead folks rely on “mvn install” doing the least work practical to incrementally bring the target directory up to date.

The target directory is updated by "verify" as well...no need for install...

Rebuilt is based on the "clean"...nothing else..which is not neccessary...


>
> In complex multi-module builds its common for one modules filtered resources to be treated as inputs to later processes. At the moment, the filtering plug-in unconditionally overwrites resources such that the filtered output file is newly modified. This means that downstream processes see their inputs as newly modified and have to assume their outputs are outdated and need to be rebuilt.

The downstream which are usually jar's which are not related with other
modules which means they don't need to be updated (depending in the
change it might be neccessar to build) but related to resources not..
cause resources are only local the module....

Building only needed things can be achieved by using: mvn -pl xy -amd in
a multi module build...

>
> (Of course, in CI environments the opposite tradeoff is normal: accept slower builds in return for guaranteed clean reliable ones!)

The problem here is usually not the "clean" most of the time it is that
today builds are often running in cloud environements where for each
build a differernt node is taken ...which means nothing can be kept
(some tools trying to handle that like Travis CI, Circle CI etc.; Also
Jenkins such an option)...But that depends on the CI solution you are
using...

>
> At work, a “mvn install -DskipTests” on our multi-module project takes minutes even when nothing has changed,

The question: Using the most recent version of all Maven plugins and how
many modules / classes needs to be compiled..and their might be an issue
with maven-compiler-plugin which is different story...

I have several projects: one with ca. 45.000 Lines of code taking 35
seconds to build with "-DskipTests"...etc.

A Test project which I use for performance testings. Usual build via
"mvn verify" takes 45 seconds (2000 modules;)...(at the moment this
project does not contain resources but that could be good testcase..

Another time building takes: 28 seconds...incremental parts are already
working..but of course can be improved...


> when I think it could be finished in seconds. So far I’ve identified the filtering and assembly plugins as triggers of unnecessary work and am trying to offer fixes to optimise that behaviour!

With filtering you mean maven-resources-plugin which uses
maven-filtering component..

>
> That context help at all?
>
> Thanks,
>
> Rob
>
>> On 25 Apr 2020, at 14:37, Slawomir Jaranowski <s....@gmail.com> wrote:
>>
>> Hi,
>>
>> Can you describe your case and what you want to achieve.
>>
>> By default all files created during maven running are write to target
>> directory. And in most case target directory is cleaned before new build
>> starting.
>> Usual maven is running  by:
>>   mvn clean install.
>>
>> sob., 25 kwi 2020 o 00:59 Robert Oxspring <ro...@imapmail.org>
>> napisał(a):
>>
>>> Hi all,
>>>
>>>
>>> When copying resources with filtering on, files are always overwritten
>>> even when the filters have not changed. I’d like to change this such that
>>> repeated filtering copies do not modify the destination file.
>>>
>>> I’ve prepared a change to write the filtered content to a temporary file
>>> and only rename that over the destination if they’re different:
>>>
>>> https://github.com/apache/maven-filtering/compare/master...roxspring:feature/avoid-overwrite-on-no-op-filter
>>>
>>> Would something like this be acceptable? I’m guessing the next step is to
>>> create an issue in Jira? - agains MNG??
>>>
>>> Thanks,
>>>
>>> Rob
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Robert Oxspring <ro...@imapmail.org>.
Hi

“mvn clean install” is definitely normal for small projects like maven plugins because it’s a cheap way of guaranteeing your build is up to date.

For developers working on larger projects “mvn clean install” is typically avoided wherever possible as it forces everything to be rebuilt even when just a couple of lines of code have been changed. Instead folks rely on “mvn install” doing the least work practical to incrementally bring the target directory up to date. 

In complex multi-module builds its common for one modules filtered resources to be treated as inputs to later processes. At the moment, the filtering plug-in unconditionally overwrites resources such that the filtered output file is newly modified. This means that downstream processes see their inputs as newly modified and have to assume their outputs are outdated and need to be rebuilt.

(Of course, in CI environments the opposite tradeoff is normal: accept slower builds in return for guaranteed clean reliable ones!)

At work, a “mvn install -DskipTests” on our multi-module project takes minutes even when nothing has changed, when I think it could be finished in seconds. So far I’ve identified the filtering and assembly plugins as triggers of unnecessary work and am trying to offer fixes to optimise that behaviour!

That context help at all?

Thanks,

Rob

> On 25 Apr 2020, at 14:37, Slawomir Jaranowski <s....@gmail.com> wrote:
> 
> Hi,
> 
> Can you describe your case and what you want to achieve.
> 
> By default all files created during maven running are write to target
> directory. And in most case target directory is cleaned before new build
> starting.
> Usual maven is running  by:
>  mvn clean install.
> 
> sob., 25 kwi 2020 o 00:59 Robert Oxspring <ro...@imapmail.org>
> napisał(a):
> 
>> Hi all,
>> 
>> 
>> When copying resources with filtering on, files are always overwritten
>> even when the filters have not changed. I’d like to change this such that
>> repeated filtering copies do not modify the destination file.
>> 
>> I’ve prepared a change to write the filtered content to a temporary file
>> and only rename that over the destination if they’re different:
>> 
>> https://github.com/apache/maven-filtering/compare/master...roxspring:feature/avoid-overwrite-on-no-op-filter
>> 
>> Would something like this be acceptable? I’m guessing the next step is to
>> create an issue in Jira? - agains MNG??
>> 
>> Thanks,
>> 
>> Rob
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>> 
>> 
> 
> -- 
> Sławomir Jaranowski


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Karl Heinz Marbaise <kh...@gmx.de>.
Hi,

On 25.04.20 15:36, Slawomir Jaranowski wrote:
> Hi,
>
> Can you describe your case and what you want to achieve.
>
> By default all files created during maven running are write to target
> directory. And in most case target directory is cleaned before new build
> starting.
> Usual maven is running  by:
>    mvn clean install.

This is what you usally shouldn't use

Install is only needed if you want to reuse the artifacts produced by
another project on your own computer otherwise it's whasted...

It's sufficient:

     mvn verify

Also clean can be avoided most of the times...and that's the thing
Robert is referencing to...

Kind regards
Karl Heinz Marbaise
>
> sob., 25 kwi 2020 o 00:59 Robert Oxspring <ro...@imapmail.org>
> napisał(a):
>
>> Hi all,
>>
>>
>> When copying resources with filtering on, files are always overwritten
>> even when the filters have not changed. I’d like to change this such that
>> repeated filtering copies do not modify the destination file.
>>
>> I’ve prepared a change to write the filtered content to a temporary file
>> and only rename that over the destination if they’re different:
>>
>> https://github.com/apache/maven-filtering/compare/master...roxspring:feature/avoid-overwrite-on-no-op-filter
>>
>> Would something like this be acceptable? I’m guessing the next step is to
>> create an issue in Jira? - agains MNG??
>> Thanks,
>>
>> Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Filtering Plugin should avoid overwriting even when filtering

Posted by Slawomir Jaranowski <s....@gmail.com>.
Hi,

Can you describe your case and what you want to achieve.

By default all files created during maven running are write to target
directory. And in most case target directory is cleaned before new build
starting.
Usual maven is running  by:
  mvn clean install.

sob., 25 kwi 2020 o 00:59 Robert Oxspring <ro...@imapmail.org>
napisał(a):

> Hi all,
>
>
> When copying resources with filtering on, files are always overwritten
> even when the filters have not changed. I’d like to change this such that
> repeated filtering copies do not modify the destination file.
>
> I’ve prepared a change to write the filtered content to a temporary file
> and only rename that over the destination if they’re different:
>
> https://github.com/apache/maven-filtering/compare/master...roxspring:feature/avoid-overwrite-on-no-op-filter
>
> Would something like this be acceptable? I’m guessing the next step is to
> create an issue in Jira? - agains MNG??
>
> Thanks,
>
> Rob
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

-- 
Sławomir Jaranowski