You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by Benjamin Bentmann <be...@udo.edu> on 2008/03/21 20:15:51 UTC

Maven and File Encoding

Hi,

There are still several code spots in Maven that rely on the platform's 
default encoding when processing text files which does support build 
reproducibility. It could help developers to detect such defects if the CI 
machines were configured to run Maven with

  MAVEN_OPTS=-Dfile.encoding=UTF-16

This setting will wreck havoc on any component that assumes ASCII, UTF-8 or 
Latin-1 when converting between characters and bytes.

Ciao,


Benjamin 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Hervé BOUTEMY <he...@free.fr>.
Le samedi 29 mars 2008, Benjamin Bentmann a écrit :
> > The convention has to be shared between every plugin and the super POM. I
> > dislike the pure formal convention, that every plugin would copy/paste.
> > Coded as an API like "String checkSourceEncoding(String encoding)" method
> > to
> > AbstractMojo, that every plugin would call: "encoding =
> > checkSourceCodeEncoding(encoding)" is ok for me
>
> +1, on the general idea with the helper method to materialize the
> convention. I only fear that AbstractMojo is not a good candidate to host
> this method: AbstractMojo is part of the uber JAR and hence plugins cannot
> use a different version of this class until they increase their
> prerequisite on Maven to 2.0.10+.
oh yes, I forgot this.

> Maybe we could drop the method somewhere 
> into plexus-utils' ReaderFactory?
if such a method goes into plexus-utils, it has to be more generic.
checkDefaultEncoding?
Then the default value should be configurable, and not only hardcoded. But 
how?

>
>
> Benjamin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Benjamin Bentmann <be...@udo.edu>.
> > Otherwise, plugins should by convention agree to handle a null value for
> > the encoding parameter to denote some fixed encoding. This way, 
> > upgrading
> > Maven will not affect the default encoding behavior.
> The convention has to be shared between every plugin and the super POM. I
> dislike the pure formal convention, that every plugin would copy/paste.
> Coded as an API like "String checkSourceEncoding(String encoding)" method 
> to
> AbstractMojo, that every plugin would call: "encoding =
> checkSourceCodeEncoding(encoding)" is ok for me

+1, on the general idea with the helper method to materialize the 
convention. I only fear that AbstractMojo is not a good candidate to host 
this method: AbstractMojo is part of the uber JAR and hence plugins cannot 
use a different version of this class until they increase their prerequisite 
on Maven to 2.0.10+. Maybe we could drop the method somewhere into 
plexus-utils' ReaderFactory?


Benjamin 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Hervé BOUTEMY <he...@free.fr>.
Le samedi 29 mars 2008, Benjamin Bentmann a écrit :
> > core plugins to be modified:
>
> I moved this list over to the wiki article [0] in Confluence, think it's
> easier to maintain there instead of being splattered through this mailing
> thread.
great
I added a link in http://docs.codehaus.org/display/MAVEN/Home and 
http://docs.codehaus.org/display/MAVEN/All+Proposals
When the proposal will be ready, we'll call for a vote.

> > should we provide ${encoding} or ${sourceEncoding}?
>
> Don't know, it's the question of ease of typing vs. expressiveness. The
> tools javac and javadoc have their corresponding cli parameters named
> "encoding", too. On the other hand, how often do we expect users to really
> specify this parameter via the cli instead of permanently configuring the
> POM?
true: let's stay with encoding

> platform-dependent encoding. I would rather prefer that the behavior of a
> plugin depends only on its own version and not also on the executing Maven
> version to ease reproducibility.
right

> So I believe there are two choices:
> If your initial concerns about backwards compatibility meet general
> consensus, the super POM should simply not define a specific property for
> ${project.build.sourceEncoding} such that users get platform-dependent
> behavior as now by default (not my favorite).
> Otherwise, plugins should by convention agree to handle a null value for
> the encoding parameter to denote some fixed encoding. This way, upgrading
> Maven will not affect the default encoding behavior.
The convention has to be shared between every plugin and the super POM. I 
dislike the pure formal convention, that every plugin would copy/paste.
Coded as an API like "String checkSourceEncoding(String encoding)" method to 
AbstractMojo, that every plugin would call: "encoding = 
checkSourceCodeEncoding(encoding)" is ok for me

>
>
> Benjamin
>
>
> [0]
> http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Enco
>ding
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Benjamin Bentmann <be...@udo.edu>.
> core plugins to be modified:

I moved this list over to the wiki article [0] in Confluence, think it's
easier to maintain there instead of being splattered through this mailing
thread.

> > This in place, plugin parameters could be written like
> >
> >   /**
> >    * @parameter expression="${encoding}"
> >    *            default-value="${project.build.sourceEncoding}"
> >    */
> >   private String encoding;
> >
> > i.e. still provide some short expression name for overrides from the
> > cli.
> should we provide ${encoding} or ${sourceEncoding}?

Don't know, it's the question of ease of typing vs. expressiveness. The
tools javac and javadoc have their corresponding cli parameters named
"encoding", too. On the other hand, how often do we expect users to really
specify this parameter via the cli instead of permanently configuring the
POM?

> > Of course, this would require the plugin to add a manual check whether
> > the
> > default-value expression actually was existent or whether an older Maven
> > version is running.
> I'm not convinced here: just let null=platform encoding, as it has been
> the case previously

If I understand you correctly, this would make the encoding behavior
dependent on the currently used Maven version, wouldn't it? Let's assume
Maven 2.0.10+ defines the propery ${project.build.sourceEncoding} in the
super POM to a specific value like Latin-1 as agreed so far. Then, any
plugin that populates its encoding parameter via this property by means of
the default-value annotation will use that specific encoding. However, if
the same plugin version is run by Maven 2.0.9-, it will receive null for the
encoding property and as you suggested falls back to a platform-dependent
encoding. I would rather prefer that the behavior of a plugin depends only
on its own version and not also on the executing Maven version to ease
reproducibility.

So I believe there are two choices:
If your initial concerns about backwards compatibility meet general
consensus, the super POM should simply not define a specific property for
${project.build.sourceEncoding} such that users get platform-dependent
behavior as now by default (not my favorite).
Otherwise, plugins should by convention agree to handle a null value for the
encoding parameter to denote some fixed encoding. This way, upgrading Maven
will not affect the default encoding behavior.


Benjamin


[0]
http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Hervé BOUTEMY <he...@free.fr>.
Le mercredi 26 mars 2008, Benjamin Bentmann a écrit :
> > What about just unifying the expressions that refer to encoding in
> > plugins?
>
> As an intermediate solution until Maven 2.1 provides an extended POM, this
> seems like a good approach.
+1
core plugins to be modified:
- compiler
- javadoc
- resources
- jxr
- pmd
do you see other ones?

TODO: same list for Mojo project plugins

>
> > javadoc-plugin has ${encoding}
> > compiler-plugin has ${maven.compiler.encoding}
> > resources-plugin has no expression defined.
>
> My suggestion: ${project.build.sourceEncoding} or similar, i.e. have the
> expression match the yet to introduce new POM element for the encoding.
> This way, plugins using the expression would be forward-compatible with the
> extended POM and automatically use the new POM element once introduced.
+1, both for the prefix project.build and for the attribute name 
sourceEncoding

just for the record, ${project.reporting.outputEncoding} for every report 
plugins could be useful, but that's another story: we'll start another thread 
on it later...

> We could also consider to add
>
>   <properties>
>     <project.build.sourceEncoding>ISO-8859-1</project.build.sourceEncoding>
>   </properties>
>
> to the super POM for everyone to inherit.
+1: non-ISO-88596-1 builds will break, but the fix is easy now that a there is 
a unified property, so I find it acceptable

> This in place, plugin parameters could be written like
>
>   /**
>    * @parameter expression="${encoding}"
>    *            default-value="${project.build.sourceEncoding}"
>    */
>   private String encoding;
>
> i.e. still provide some short expression name for overrides from the cli.
should we provide ${encoding} or ${sourceEncoding}?

> Of course, this would require the plugin to add a manual check whether the
> default-value expression actually was existent or whether an older Maven
> version is running.
I'm not convinced here: just let null=platform encoding, as it has been the 
case previously
or perhaps add a warning, throught a "checkSourceEncoding(String encoding)" 
method to AbstractMojo for use in every plugin?

>
>
> Benjamin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Benjamin Bentmann <be...@udo.edu>.
> What about just unifying the expressions that refer to encoding in 
> plugins?

As an intermediate solution until Maven 2.1 provides an extended POM, this 
seems like a good approach.

> javadoc-plugin has ${encoding}
> compiler-plugin has ${maven.compiler.encoding}
> resources-plugin has no expression defined.

My suggestion: ${project.build.sourceEncoding} or similar, i.e. have the 
expression match the yet to introduce new POM element for the encoding. This 
way, plugins using the expression would be forward-compatible with the 
extended POM and automatically use the new POM element once introduced.

We could also consider to add

  <properties>
    <project.build.sourceEncoding>ISO-8859-1</project.build.sourceEncoding>
  </properties>

to the super POM for everyone to inherit. This in place, plugin parameters 
could be written like

  /**
   * @parameter expression="${encoding}"
   *            default-value="${project.build.sourceEncoding}"
   */
  private String encoding;

i.e. still provide some short expression name for overrides from the cli. Of 
course, this would require the plugin to add a manual check whether the 
default-value expression actually was existent or whether an older Maven 
version is running.


Benjamin 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Milos Kleint <mk...@gmail.com>.
I've encountered this in the IDE integration for netbeans when I want
to allow users to set encoding for the project. Currently I'm setting
configuration for compiler-plugin and resources-plugin but that's
cumbersome.


On Wed, Mar 26, 2008 at 11:37 AM, Benjamin Bentmann
<be...@udo.edu> wrote:
> > There is one feature in this patch I don't like: you hardcoded a default
>  > encoding to ISO-8859-1 instead of no default value (which means platform
>  > encoding).
>
>  Indeed, I also suggested using a fixed default value for other plugins:
>  - MCOMPILER-63
>  - MJAVADOC-165
>  - MRESOURCES-57
>
>  I simply chose Latin-1 here to be consistent with the Site Plugin whose
>  inputEncoding/outputEncoding likewise defaults to Latin-1 for quite some
>  time.
>
>
>  > even if a developer wanted to configure platform encoding, he could not!
>
>  Something like
>   <encoding>${file.encoding}</encoding>
>  should still do, wouldn't it?

adding elements to pom.xml would be incompatible change however and
will take a long time to get in production.. What about just unifying
the expressions that refer to encoding in plugins?
javadoc-plugin has ${encoding}
compiler-plugin has ${maven.compiler.encoding}
resources-plugin has no expression defined.

If all places defined a common name, we would be fine, right? you
would just define the right property in the pom..

Regards


Milos

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Benjamin Bentmann <be...@udo.edu>.
> There is one feature in this patch I don't like: you hardcoded a default
> encoding to ISO-8859-1 instead of no default value (which means platform
> encoding).

Indeed, I also suggested using a fixed default value for other plugins:
- MCOMPILER-63
- MJAVADOC-165
- MRESOURCES-57

I simply chose Latin-1 here to be consistent with the Site Plugin whose
inputEncoding/outputEncoding likewise defaults to Latin-1 for quite some
time.

> even if a developer wanted to configure platform encoding, he could not!

Something like
  <encoding>${file.encoding}</encoding>
should still do, wouldn't it?

> Platform encoding is here to give most developers freedom to ignore
> encoding notions. And most of the time it works well.
> Every native tool use platform encoding by default, so do javac, javadoc,
> ant,...

You basically describe the status quo for the encoding handling. Now I
believe that some improvements simply require to break with existing
standards to get away from questionable decisions made in the past. For
instance, Maven 2.0.8 and Surefire 2.3.1 introduced deterministic/correct
class path ordering. This caused some people's builds to fail but
nevertheless I believe this change was the only way to go.

Considering the following circumstances
 a) Java has an international audience and
 b) developer communities are joined from different countries
I believe that to "ignore encoding notions" is an anti-pattern that should
be banned, not supported. Also, I guess that it's partly this ignorance that
caused and still causes all the pain with proper encoding support in Maven:
If developers would have been required to explicitly specify file encodings
for tools like javac or classes like FileWriter/FileReader, I believe that
would have made them sensible to the topic and could have prevented some of
the design/implementation flaws we see right now.

Personally, I'm a fan of these Maven philosophies:
- ensure builds are reproducible
- prefer convention over configuration
If I apply those to the encoding issue as well (as a matter of consistency),
the conclusion is to have Maven provide a fixed default value for the
encoding.

> Having a default encoding is not consistent, and people not using
> ISO-8859-1 as their platform encoding won't understand the failure (or
> understand this is a complicated encoding problem caused by a bad tool).

Also note, that Maven already chose to be inconsistent with native tools
regarding compiler source/target settings. The native javac defaults these
options according to its own JVM while Maven uses fixed default values.
Those defaults caused compilation failures for my builds and made me spend
some minutes to fix the POM. However, I wouldn't blaim Maven here for being
inconsistent, I'm happy it taught me to get my build platform-independent.

> To help developers discover that relying on platform encoding is not a
> good choice, I think there are better ways than stopping the "magic" of
> it:

I feel this is similar to the auto-update "feature" for Maven plugins: I'm
not sure but wasn't the consensus about this to disable auto-update by
requiring the existence of <version> elements for plugins in the POM für
Maven 2.1? This would also mean a stop of "magic" though in a
smoother/expected way since the POM version would change and as such has all
right to define different validation semantics.

> - I just changed resources plugin with MRESOURCES-62 to be more explicit:
> > [INFO] [resources:resources]
> > [INFO] Using platform encoding (UTF-16 actually) to copy filtered
> > resources.

Side note: I would prefer to output the last log message on level "WARNING".
An INFO message is ordinary output noise, nobody will care about that,
leaving the POM as bad as is.

> - perhaps Maven itself should show the good habits: define a property in
> the parent pom and take care of properly declaring encoding for plugins

I agree here, Maven should definitively have a central property in the POM
to control encoding for an entire build. See also MNG-2216.


Benjamin


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Hervé BOUTEMY <he...@free.fr>.
> I haven't had yet the time to setup some nice IT for this, but the fix is
> here: MPLUGIN-101.
great!

There is one feature in this patch I don't like: you hardcoded a default 
encoding to ISO-8859-1 instead of no default value (which means platform 
encoding).
I understand that platform encoding is bad for reproducibility, and you avoid 
platform encoding with this default value: even if a developer wanted to 
configure platform encoding, he could not!

But I don't think it is a good choice.

Platform encoding is here to give most developers freedom to ignore encoding 
notions. And most of the time it works well.
Every native tool use platform encoding by default, so do javac, javadoc, 
ant,... Having a default encoding is not consistent, and people not using 
ISO-8859-1 as their platform encoding won't understand the failure (or 
understand this is a complicated encoding problem caused by a bad tool).


To help developers discover that relying on platform encoding is not a good 
choice, I think there are better ways than stopping the "magic" of it:

- I just changed resources plugin with MRESOURCES-62 to be more explicit:
> [INFO] [resources:resources]
> [INFO] Using platform encoding (UTF-16 actually) to copy filtered resources.
Now, developer knows he is using platform encoding, and he even know which one 
is used *actually*. Then setting a configuration is not a problem: just do it 
(no more question: "what is your platform encoding?")

- perhaps there should be an enforcer rule to check that common plugins are 
configured with explicit encoding.

- perhaps Maven itself should show the good habits: define a property in the 
parent pom and take care of properly declaring encoding for plugins


regards,

Hervé

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Hervé BOUTEMY <he...@free.fr>.
that's it
thanks for the precise explanations

Le samedi 29 mars 2008, Stuart McCulloch a écrit :
> On 29/03/2008, Hervé BOUTEMY <he...@free.fr> wrote:
> > it should fail as soon as testitMNG3473 (the second test actually)
> >
> > I copied the shell script used by Hudson to launch tests on my machine
> > and tried it: same problem as the CI server, the IT tests don't fail
> >
> > I think I found the cause:
> > - I already have a MAVEN_OPTS environment variable
> > - Hudson sets the new variable in 2 steps: "set MAVEN_OPTS=..." then
> > "export
> > MAVEN_OPTS"
> > Trying these 2 commands on the console, I found that the new MAVEN_OPTS
> > value
> > is ignored, the previous value is still here
> > That is not the case if I write "export MAVEN_OPTS=..."
>
> FYI, using "set MAVEN_OPTS=..." won't actually set the variable on
> linux/unix
> because the builtin "set" command is actually used to configure shell
> options,
> not update environment variables (not to be confused with set on Windows!)
>
> in fact "set MAVEN_OPTS=..." actually sets $1 to be "MAVEN_OPTS=..." :)
>
> also the "export MAVEN_OPTS=..." syntax doesn't work on all linux/unix
> shells,
> most notably ksh, so the best way to set and export an environment variable
> is:
>
>     MAVEN_OPTS=...
>     export MAVEN_OPTS
>
> HTH
>
> I imagine this is the same problem with the CI server.
>
> > Can the conf be changed?
> > Can I have karma on it to try? I created a login: hboutemy.
> >
> > regards,
> >
> > Hervé
> >
> > Le mardi 25 mars 2008, Brian E. Fox a écrit :
> > > Nope, nothing funky about the maven on there, the base 2.0 is the 2.0.8
> > > package.
> > >
> > > -----Original Message-----
> > > From: Benjamin Bentmann [mailto:benjamin.bentmann@udo.edu]
> > > Sent: Tuesday, March 25, 2008 5:02 PM
> > > To: Maven Developers List
> > > Subject: Re: Maven and File Encoding
> > >
> > > > Assuming I didn't mess up the config (you can see the execution at
> > > > the beginning of the console output) it seems to be running without
> > > > any errors so far.
> > >
> > > Hm, both theory and practice tell me that reading ASCII files with
> > > UTF-16
> > > encoding is a rather bad idea, so the build must fail if properly
> > > configured.
> > >
> > > After some investigation, it appears that my initial suggestion to
> > > simply
> > > set MAVEN_OPTS does not really work. For example, from the Surefire XML
> > > report [0] I read
> > >   <property value="ANSI_X3.4-1968" name="file.encoding"/>
> > > so Maven is still using ASCII.
> > >
> > > One part of this problem could be all the process forking done during
> > > the
> > > tests: If I count properly, there is one fork by Surefire for the whole
> > > suite and one additional fork once per Maven invocation by the
> > > Verifier. The
> > > challenge is to get the -Dfile.encoding setting down all this road. The
> > > MAVEN_OPTS var simply isn't pushed through all the sub environments.
> > >
> > > What I could not figure out is why the root invocation of
> > > maven/2.0./bin succeeded in the first place. That invocation should
> > > have respected the exported MAVEN_OPTS var and as such should have
> > > broke immediately due to PLX-367. Is the build using a customized run
> > > script that does not care about
> > > MAVEN_OPTS? Just curious, in the end it's quite desirable to have the
> > > root
> > > Maven process use a safe environment/encoding since we really want to
> > > test
> > > the other Maven executable.
> > >
> > >
> > > Benjamin
> > >
> > >
> > > [0]
> > > https://ci.sonatype.org/job/Maven-2.0.x-ITs-UTF-16/ws/maven-core-its/ta
> > >r get/surefire-reports/TEST-org.apache.maven.its.Suite.xml
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > > For additional commands, e-mail: dev-help@maven.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > > For additional commands, e-mail: dev-help@maven.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Hervé BOUTEMY <he...@free.fr>.
Le samedi 29 mars 2008, Brian E. Fox a écrit :
> Hi Herve, Yes those can be changed as I used the shell execution to try
> and force the options. You should be able to edit directly with your
> account.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org

Thank you
I changed the configuration of the 2 projects.
And now Maven-2.0.x-ITs-UTF-16 is failing, as expected: the havoc is here! ;)

Time to work on fixes...

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Stuart McCulloch <st...@jayway.net>.
On 29/03/2008, Hervé BOUTEMY <he...@free.fr> wrote:
>
> it should fail as soon as testitMNG3473 (the second test actually)
>
> I copied the shell script used by Hudson to launch tests on my machine and
> tried it: same problem as the CI server, the IT tests don't fail
>
> I think I found the cause:
> - I already have a MAVEN_OPTS environment variable
> - Hudson sets the new variable in 2 steps: "set MAVEN_OPTS=..." then
> "export
> MAVEN_OPTS"
> Trying these 2 commands on the console, I found that the new MAVEN_OPTS
> value
> is ignored, the previous value is still here
> That is not the case if I write "export MAVEN_OPTS=..."


FYI, using "set MAVEN_OPTS=..." won't actually set the variable on
linux/unix
because the builtin "set" command is actually used to configure shell
options,
not update environment variables (not to be confused with set on Windows!)

in fact "set MAVEN_OPTS=..." actually sets $1 to be "MAVEN_OPTS=..." :)

also the "export MAVEN_OPTS=..." syntax doesn't work on all linux/unix
shells,
most notably ksh, so the best way to set and export an environment variable
is:

    MAVEN_OPTS=...
    export MAVEN_OPTS

HTH

I imagine this is the same problem with the CI server.
> Can the conf be changed?
> Can I have karma on it to try? I created a login: hboutemy.
>
> regards,
>
> Hervé
>
> Le mardi 25 mars 2008, Brian E. Fox a écrit :
>
> > Nope, nothing funky about the maven on there, the base 2.0 is the 2.0.8
> > package.
> >
> > -----Original Message-----
> > From: Benjamin Bentmann [mailto:benjamin.bentmann@udo.edu]
> > Sent: Tuesday, March 25, 2008 5:02 PM
> > To: Maven Developers List
> > Subject: Re: Maven and File Encoding
> >
> > > Assuming I didn't mess up the config (you can see the execution at the
> > > beginning of the console output) it seems to be running without any
> > > errors so far.
> >
> > Hm, both theory and practice tell me that reading ASCII files with
> > UTF-16
> > encoding is a rather bad idea, so the build must fail if properly
> > configured.
> >
> > After some investigation, it appears that my initial suggestion to
> > simply
> > set MAVEN_OPTS does not really work. For example, from the Surefire XML
> > report [0] I read
> >   <property value="ANSI_X3.4-1968" name="file.encoding"/>
> > so Maven is still using ASCII.
> >
> > One part of this problem could be all the process forking done during
> > the
> > tests: If I count properly, there is one fork by Surefire for the whole
> > suite and one additional fork once per Maven invocation by the Verifier.
> > The
> > challenge is to get the -Dfile.encoding setting down all this road. The
> > MAVEN_OPTS var simply isn't pushed through all the sub environments.
> >
> > What I could not figure out is why the root invocation of maven/2.0./bin
> > succeeded in the first place. That invocation should have respected the
> > exported MAVEN_OPTS var and as such should have broke immediately due to
> > PLX-367. Is the build using a customized run script that does not care
> > about
> > MAVEN_OPTS? Just curious, in the end it's quite desirable to have the
> > root
> > Maven process use a safe environment/encoding since we really want to
> > test
> > the other Maven executable.
> >
> >
> > Benjamin
> >
> >
> > [0]
> > https://ci.sonatype.org/job/Maven-2.0.x-ITs-UTF-16/ws/maven-core-its/tar
> > get/surefire-reports/TEST-org.apache.maven.its.Suite.xml
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>


-- 
Cheers, Stuart

RE: Maven and File Encoding

Posted by "Brian E. Fox" <br...@reply.infinity.nu>.
Hi Herve, Yes those can be changed as I used the shell execution to try
and force the options. You should be able to edit directly with your
account.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Hervé BOUTEMY <he...@free.fr>.
it should fail as soon as testitMNG3473 (the second test actually)

I copied the shell script used by Hudson to launch tests on my machine and 
tried it: same problem as the CI server, the IT tests don't fail

I think I found the cause:
- I already have a MAVEN_OPTS environment variable
- Hudson sets the new variable in 2 steps: "set MAVEN_OPTS=..." then "export 
MAVEN_OPTS"
Trying these 2 commands on the console, I found that the new MAVEN_OPTS value 
is ignored, the previous value is still here
That is not the case if I write "export MAVEN_OPTS=..."

I imagine this is the same problem with the CI server.
Can the conf be changed?
Can I have karma on it to try? I created a login: hboutemy.

regards,

Hervé

Le mardi 25 mars 2008, Brian E. Fox a écrit :
> Nope, nothing funky about the maven on there, the base 2.0 is the 2.0.8
> package.
>
> -----Original Message-----
> From: Benjamin Bentmann [mailto:benjamin.bentmann@udo.edu]
> Sent: Tuesday, March 25, 2008 5:02 PM
> To: Maven Developers List
> Subject: Re: Maven and File Encoding
>
> > Assuming I didn't mess up the config (you can see the execution at the
> > beginning of the console output) it seems to be running without any
> > errors so far.
>
> Hm, both theory and practice tell me that reading ASCII files with
> UTF-16
> encoding is a rather bad idea, so the build must fail if properly
> configured.
>
> After some investigation, it appears that my initial suggestion to
> simply
> set MAVEN_OPTS does not really work. For example, from the Surefire XML
> report [0] I read
>   <property value="ANSI_X3.4-1968" name="file.encoding"/>
> so Maven is still using ASCII.
>
> One part of this problem could be all the process forking done during
> the
> tests: If I count properly, there is one fork by Surefire for the whole
> suite and one additional fork once per Maven invocation by the Verifier.
> The
> challenge is to get the -Dfile.encoding setting down all this road. The
> MAVEN_OPTS var simply isn't pushed through all the sub environments.
>
> What I could not figure out is why the root invocation of maven/2.0./bin
> succeeded in the first place. That invocation should have respected the
> exported MAVEN_OPTS var and as such should have broke immediately due to
> PLX-367. Is the build using a customized run script that does not care
> about
> MAVEN_OPTS? Just curious, in the end it's quite desirable to have the
> root
> Maven process use a safe environment/encoding since we really want to
> test
> the other Maven executable.
>
>
> Benjamin
>
>
> [0]
> https://ci.sonatype.org/job/Maven-2.0.x-ITs-UTF-16/ws/maven-core-its/tar
> get/surefire-reports/TEST-org.apache.maven.its.Suite.xml
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


RE: Maven and File Encoding

Posted by "Brian E. Fox" <br...@reply.infinity.nu>.
Nope, nothing funky about the maven on there, the base 2.0 is the 2.0.8
package.

-----Original Message-----
From: Benjamin Bentmann [mailto:benjamin.bentmann@udo.edu] 
Sent: Tuesday, March 25, 2008 5:02 PM
To: Maven Developers List
Subject: Re: Maven and File Encoding

> Assuming I didn't mess up the config (you can see the execution at the
> beginning of the console output) it seems to be running without any
> errors so far.

Hm, both theory and practice tell me that reading ASCII files with
UTF-16
encoding is a rather bad idea, so the build must fail if properly
configured.

After some investigation, it appears that my initial suggestion to
simply
set MAVEN_OPTS does not really work. For example, from the Surefire XML
report [0] I read
  <property value="ANSI_X3.4-1968" name="file.encoding"/>
so Maven is still using ASCII.

One part of this problem could be all the process forking done during
the
tests: If I count properly, there is one fork by Surefire for the whole
suite and one additional fork once per Maven invocation by the Verifier.
The
challenge is to get the -Dfile.encoding setting down all this road. The
MAVEN_OPTS var simply isn't pushed through all the sub environments.

What I could not figure out is why the root invocation of maven/2.0./bin
succeeded in the first place. That invocation should have respected the
exported MAVEN_OPTS var and as such should have broke immediately due to
PLX-367. Is the build using a customized run script that does not care
about
MAVEN_OPTS? Just curious, in the end it's quite desirable to have the
root
Maven process use a safe environment/encoding since we really want to
test
the other Maven executable.


Benjamin


[0]
https://ci.sonatype.org/job/Maven-2.0.x-ITs-UTF-16/ws/maven-core-its/tar
get/surefire-reports/TEST-org.apache.maven.its.Suite.xml


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Benjamin Bentmann <be...@udo.edu>.
> Assuming I didn't mess up the config (you can see the execution at the
> beginning of the console output) it seems to be running without any
> errors so far.

Hm, both theory and practice tell me that reading ASCII files with UTF-16
encoding is a rather bad idea, so the build must fail if properly
configured.

After some investigation, it appears that my initial suggestion to simply
set MAVEN_OPTS does not really work. For example, from the Surefire XML
report [0] I read
  <property value="ANSI_X3.4-1968" name="file.encoding"/>
so Maven is still using ASCII.

One part of this problem could be all the process forking done during the
tests: If I count properly, there is one fork by Surefire for the whole
suite and one additional fork once per Maven invocation by the Verifier. The
challenge is to get the -Dfile.encoding setting down all this road. The
MAVEN_OPTS var simply isn't pushed through all the sub environments.

What I could not figure out is why the root invocation of maven/2.0./bin
succeeded in the first place. That invocation should have respected the
exported MAVEN_OPTS var and as such should have broke immediately due to
PLX-367. Is the build using a customized run script that does not care about
MAVEN_OPTS? Just curious, in the end it's quite desirable to have the root
Maven process use a safe environment/encoding since we really want to test
the other Maven executable.


Benjamin


[0]
https://ci.sonatype.org/job/Maven-2.0.x-ITs-UTF-16/ws/maven-core-its/target/surefire-reports/TEST-org.apache.maven.its.Suite.xml


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


RE: Maven and File Encoding

Posted by "Brian E. Fox" <br...@reply.infinity.nu>.
It's setup and also running the Its.
https://ci.sonatype.org/job/Maven-2.0.x-ITs-UTF-16/3/console

Assuming I didn't mess up the config (you can see the execution at the
beginning of the console output) it seems to be running without any
errors so far.

-----Original Message-----
From: Benjamin Bentmann [mailto:benjamin.bentmann@udo.edu] 
Sent: Monday, March 24, 2008 6:34 PM
To: Maven Developers List
Subject: Re: Maven and File Encoding

> I can setup an alternate build for Maven. 

Cool, thanks Brian.

> What java options do I need to set to change the encoding?

Setting the system property "file.encoding" should do:

 -Dfile.encoding=UTF-16

either via MAVEN_OPTS or directly on the java command line.


Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Benjamin Bentmann <be...@udo.edu>.
> I can setup an alternate build for Maven. 

Cool, thanks Brian.

> What java options do I need to set to change the encoding?

Setting the system property "file.encoding" should do:

 -Dfile.encoding=UTF-16

either via MAVEN_OPTS or directly on the java command line.


Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


RE: Maven and File Encoding

Posted by "Brian E. Fox" <br...@reply.infinity.nu>.
I can setup an alternate build for Maven. What java options do I need to
set to change the encoding?

-----Original Message-----
From: Benjamin Bentmann [mailto:benjamin.bentmann@udo.edu] 
Sent: Monday, March 24, 2008 6:12 PM
To: Maven Developers List
Subject: Re: Maven and File Encoding

> But if we do so on a whole CI server, I fear the havoc will be bigger
than
> expected

I surely did not mean to do this immediately for the main CI builds.
Given
Maven's current state, it's some future goal. Maybe we could setup a
secondary CI build job for this, such that people working on the
encoding
issues can see how well the progress is without affecting the usual CI
reports? Not sure what the infrastructure allows.

> and we'll need a lot of work to fix everything and have a stable
situation

Well, you're not alone ;-) I already started to hunt down some
occurrences
of FileWriter/FileReader which are prime indicators for wrong encoding
handling.

> I found that building any plugin fails: there are problems when
scanning
> mojo.

I haven't had yet the time to setup some nice IT for this, but the fix
is
here: MPLUGIN-101. Some related encoding problems for the plugin tools
are
also already hanging around in JIRA.


Benjamin


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Benjamin Bentmann <be...@udo.edu>.
> But if we do so on a whole CI server, I fear the havoc will be bigger than
> expected

I surely did not mean to do this immediately for the main CI builds. Given
Maven's current state, it's some future goal. Maybe we could setup a
secondary CI build job for this, such that people working on the encoding
issues can see how well the progress is without affecting the usual CI
reports? Not sure what the infrastructure allows.

> and we'll need a lot of work to fix everything and have a stable situation

Well, you're not alone ;-) I already started to hunt down some occurrences
of FileWriter/FileReader which are prime indicators for wrong encoding
handling.

> I found that building any plugin fails: there are problems when scanning
> mojo.

I haven't had yet the time to setup some nice IT for this, but the fix is
here: MPLUGIN-101. Some related encoding problems for the plugin tools are
also already hanging around in JIRA.


Benjamin


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Hervé BOUTEMY <he...@free.fr>.
I like the idea: easier to do than having access to a Z/OS box (anyone able to 
deploy a CI on a Z/OS box?), even if every aspect is not covered (for example 
it doesn't fail if encoding parameter has not been set in compiler plugin)

But if we do so on a whole CI server, I fear the havoc will be bigger than 
expected and we'll need a lot of work to fix everything and have a stable 
situation: I'm working on encoding in XML files for a long time now, which is 
a part of the overall encoding question, and even on this part, I know 
everything is not yet fixed.

I just tried manually to check the result of such a configuration.

I found that building any plugin fails: there are problems when scanning mojo. 
We'll need to fix this general problem before setting this option on a whole 
CI server. The problem starts with maven-plugin-plugin: when 2.4.1 release is 
out (for Maven 2.0.9), we'll be able to work on it.

I was able to build components/branches/maven-2.0.x: with PLX-343 and PLX-367 
being fixed in svn but not integrated in Maven 2.0.x, I expected a failure 
while reading classworlds configuration then plexus configuration. I was 
surprised, but it seems to work better than expected (I'd really like to 
understand why...).


All in all, if the configuration can be set component by component, I have no 
problem.
But if the whole CI is impacted, I think we'll need some work before setting 
the configuration.

regards,

Hervé

Le vendredi 21 mars 2008, Benjamin Bentmann a écrit :
> Hi,
>
> There are still several code spots in Maven that rely on the platform's
> default encoding when processing text files which does support build
> reproducibility. It could help developers to detect such defects if the CI
> machines were configured to run Maven with
>
>   MAVEN_OPTS=-Dfile.encoding=UTF-16
>
> This setting will wreck havoc on any component that assumes ASCII, UTF-8 or
> Latin-1 when converting between characters and bytes.
>
> Ciao,
>
>
> Benjamin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Igor Fedorenko <ig...@ifedorenko.com>.
It might be a good idea to run CI with IBM JDK even on more common 
hardware. For example, I could not build maven with IBM JDK last time I 
tried. I did not have much time to investigate, but it looked like IBM 
JDK could not read modello's pom.xml as UTF-8.

--
Regards,
Igor

Hervé BOUTEMY wrote:
> AFAIK, the problem doesn't show for japanese nor chinese developers: even in 
> Japan or China, platform encoding on Unix or Windows is ASCII based, 
> precisely to avoid problems on "simple" ascii characters
> 
> like Benjamin says, the encoding issue is more subtle
> 
> There is one situation I know for this encoding issue, where a developer 
> cannot use Maven: it is Z/OS, where platform encoding is EBCDIC which is not 
> ASCII based. MANTTASKS-14 is a report of such a problem.
> 
> But as there is not much Java developers on Z/OS, there are few reports and 
> few testers for improvements: hence the good idea from Benjamin to force 
> UTF-16 on CI machines.
> 
> BTW, if anybody has a Z/OS box, I'll be glad to have some testers on 
> MANTTASKS-14 :)
> 
> regards
> 
> Hervé
> 
> Le lundi 24 mars 2008, Benjamin Bentmann a écrit :
>>> does this mean maven actually never worked fro japanese or chinese
>>> developers?
>> It depends: As long as one sticks to US-ASCII when editing source files and
>> uses a platform encoding that has US-ASCII as a subset (UTF-8, Latin-X,
>> ...), you won't notice Maven's misbehavior. Also, using Non-ASCII
>> characters could work if all development machines use the same default
>> encoding (preferably UTF-8 for best harmony with the XML readers/writers).
>>
>> The encoding issue is really subtle because it only shows up in
>> certain/rare situations. That's why I suggest to configure the CI machines
>> to UTF-16 some day in the future such that at least some tests run in an
>> edge-case environment.
>>
>>
>> Benjamin


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Hervé BOUTEMY <he...@free.fr>.
AFAIK, the problem doesn't show for japanese nor chinese developers: even in 
Japan or China, platform encoding on Unix or Windows is ASCII based, 
precisely to avoid problems on "simple" ascii characters

like Benjamin says, the encoding issue is more subtle

There is one situation I know for this encoding issue, where a developer 
cannot use Maven: it is Z/OS, where platform encoding is EBCDIC which is not 
ASCII based. MANTTASKS-14 is a report of such a problem.

But as there is not much Java developers on Z/OS, there are few reports and 
few testers for improvements: hence the good idea from Benjamin to force 
UTF-16 on CI machines.

BTW, if anybody has a Z/OS box, I'll be glad to have some testers on 
MANTTASKS-14 :)

regards

Hervé

Le lundi 24 mars 2008, Benjamin Bentmann a écrit :
> > does this mean maven actually never worked fro japanese or chinese
> > developers?
>
> It depends: As long as one sticks to US-ASCII when editing source files and
> uses a platform encoding that has US-ASCII as a subset (UTF-8, Latin-X,
> ...), you won't notice Maven's misbehavior. Also, using Non-ASCII
> characters could work if all development machines use the same default
> encoding (preferably UTF-8 for best harmony with the XML readers/writers).
>
> The encoding issue is really subtle because it only shows up in
> certain/rare situations. That's why I suggest to configure the CI machines
> to UTF-16 some day in the future such that at least some tests run in an
> edge-case environment.
>
>
> Benjamin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Benjamin Bentmann <be...@udo.edu>.
> does this mean maven actually never worked fro japanese or chinese 
> developers?

It depends: As long as one sticks to US-ASCII when editing source files and 
uses a platform encoding that has US-ASCII as a subset (UTF-8, Latin-X, 
...), you won't notice Maven's misbehavior. Also, using Non-ASCII characters 
could work if all development machines use the same default encoding 
(preferably UTF-8 for best harmony with the XML readers/writers).

The encoding issue is really subtle because it only shows up in certain/rare 
situations. That's why I suggest to configure the CI machines to UTF-16 some 
day in the future such that at least some tests run in an edge-case 
environment.


Benjamin 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven and File Encoding

Posted by Milos Kleint <mk...@gmail.com>.
just curious..

does this mean maven actually never worked fro japanese or chinese developers?

Milos

On Fri, Mar 21, 2008 at 8:15 PM, Benjamin Bentmann
<be...@udo.edu> wrote:
> Hi,
>
>  There are still several code spots in Maven that rely on the platform's
>  default encoding when processing text files which does support build
>  reproducibility. It could help developers to detect such defects if the CI
>  machines were configured to run Maven with
>
>   MAVEN_OPTS=-Dfile.encoding=UTF-16
>
>  This setting will wreck havoc on any component that assumes ASCII, UTF-8 or
>  Latin-1 when converting between characters and bytes.
>
>  Ciao,
>
>
>  Benjamin
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>  For additional commands, e-mail: dev-help@maven.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org