You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Baptiste Gaillard <b_...@hotmail.com> on 2009/11/10 15:09:22 UTC

Encoding problems with Eclipse/JCasGen

Hi,

Im' using Maven to build my Type System JAR archives. 

In my Maven project configuration I have that:
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-compiler-plugin</artifactId>
  <version>2.0.2</version>
  <configuration>
  <encoding>UTF-8</encoding>
    <source>1.6</source>
    <target>1.6</target>
  </configuration>
</plugin>

So I want to have the source files using the UTF-8 encoding. This allow to have a 'portable' source code and not to have Windows specific encoding. 
But, I'm French and I want to have accents in my Type System description. 
If I enter a description in French with accent this does not work because it seems the JCasGen does not takes care of the Encoding specified in Eclipse for Java Source files. 

For exemple when I compiler le Java files generated with the JCasGen I have things like that:

D:\......\IndividualSiB_Type.java:[15,15] unmappable character for encoding UTF-8

This is because in IndivualSiB_Type those lines are generated:
/** Correspond � un bloc image d�un item particulier du corpus BMS EDF Offline.
 * Updated by JCasGen Tue Nov 10 14:54:57 CET 2009
 * @generated */

But I should have 
/** Correspond à un bloc image d'un item particulier du corpus BMS EDF Offline.

 * Updated by JCasGen Tue Nov 10 14:54:57 CET 2009

 * @generated */

So I think this is a bug ?

Thanks, 

Baptiste

 		 	   		  
_________________________________________________________________
Nouveau! Découvrez le Windows phone Samsung Omnia II disponible chez SFR. 
http://clk.atdmt.com/FRM/go/175819072/direct/01/

RE: Encoding problems with Eclipse/JCasGen

Posted by Baptiste Gaillard <b_...@hotmail.com>.
Ok, in fact perhaps this is beacause I'm using uima-2.2.2-incubating, their is no UTF-8 encoding information is in the parent POM, in the SVN version of UIMA their is. 

So I think this problem will disappear with UIMA 2.3.

Thanks,

Baptiste

> Date: Tue, 10 Nov 2009 15:29:51 +0100
> Subject: Re: Encoding problems with Eclipse/JCasGen
> From: tommaso.teofili@gmail.com
> To: uima-dev@incubator.apache.org
> 
> Hi Baptiste,
> just a clarification in case you missed it: one thing is the encoding you
> specify inside Maven POM and another thing is the encoding you can specify
> for a project (or for the entire workspace) in Eclipse, you can override one
> another according to your Eclipse configuration (mvn eclipse:eclipse or
> m2eclipse).
> 
> In uimaj parent POM there is already a
> <encoding>UTF-8</encoding>
> tag for resources, javadoc and compiler plugins for the reason you mentioned
> (mainly potability), so this is a strange behviour.
> 
> I've tried to put the description 'Correspond à un bloc image' inside the
> description of a Type inside the Type System and the generated file comment
> is as expected :
> /** Correspond à un bloc image
>  * Updated by JCasGen Tue Nov 10 15:27:25 CET 2009
> ...
> 
> So maybe there is something elsewhere it's telling your eclipse which
> encoding to use to generate files.
> Hope that helps.
> Tommaso
> 
> 2009/11/10 Baptiste Gaillard <b_...@hotmail.com>
> 
> >
> > Hi,
> >
> > Im' using Maven to build my Type System JAR archives.
> >
> > In my Maven project configuration I have that:
> > <plugin>
> >  <groupId>org.apache.maven.plugins</groupId>
> >  <artifactId>maven-compiler-plugin</artifactId>
> >  <version>2.0.2</version>
> >  <configuration>
> >  <encoding>UTF-8</encoding>
> >    <source>1.6</source>
> >    <target>1.6</target>
> >  </configuration>
> > </plugin>
> >
> > So I want to have the source files using the UTF-8 encoding. This allow to
> > have a 'portable' source code and not to have Windows specific encoding.
> > But, I'm French and I want to have accents in my Type System description.
> > If I enter a description in French with accent this does not work because
> > it seems the JCasGen does not takes care of the Encoding specified in
> > Eclipse for Java Source files.
> >
> > For exemple when I compiler le Java files generated with the JCasGen I have
> > things like that:
> >
> > D:\......\IndividualSiB_Type.java:[15,15] unmappable character for encoding
> > UTF-8
> >
> > This is because in IndivualSiB_Type those lines are generated:
> > /** Correspond � un bloc image d�un item particulier du corpus BMS EDF
> > Offline.
> >  * Updated by JCasGen Tue Nov 10 14:54:57 CET 2009
> >  * @generated */
> >
> > But I should have
> > /** Correspond à un bloc image d'un item particulier du corpus BMS EDF
> > Offline.
> >
> >  * Updated by JCasGen Tue Nov 10 14:54:57 CET 2009
> >
> >  * @generated */
> >
> > So I think this is a bug ?
> >
> > Thanks,
> >
> > Baptiste
> >
> >
> > _________________________________________________________________
> > Nouveau! Découvrez le Windows phone Samsung Omnia II disponible chez SFR.
> > http://clk.atdmt.com/FRM/go/175819072/direct/01/
 		 	   		  
_________________________________________________________________
Nouveau ! Tout Windows débarque dans votre téléphone. Voir les Windows phone
http://clk.atdmt.com/FRM/go/175819071/direct/01/

Re: Encoding problems with Eclipse/JCasGen

Posted by Tommaso Teofili <to...@gmail.com>.
Hi Baptiste,
just a clarification in case you missed it: one thing is the encoding you
specify inside Maven POM and another thing is the encoding you can specify
for a project (or for the entire workspace) in Eclipse, you can override one
another according to your Eclipse configuration (mvn eclipse:eclipse or
m2eclipse).

In uimaj parent POM there is already a
<encoding>UTF-8</encoding>
tag for resources, javadoc and compiler plugins for the reason you mentioned
(mainly potability), so this is a strange behviour.

I've tried to put the description 'Correspond à un bloc image' inside the
description of a Type inside the Type System and the generated file comment
is as expected :
/** Correspond à un bloc image
 * Updated by JCasGen Tue Nov 10 15:27:25 CET 2009
...

So maybe there is something elsewhere it's telling your eclipse which
encoding to use to generate files.
Hope that helps.
Tommaso

2009/11/10 Baptiste Gaillard <b_...@hotmail.com>

>
> Hi,
>
> Im' using Maven to build my Type System JAR archives.
>
> In my Maven project configuration I have that:
> <plugin>
>  <groupId>org.apache.maven.plugins</groupId>
>  <artifactId>maven-compiler-plugin</artifactId>
>  <version>2.0.2</version>
>  <configuration>
>  <encoding>UTF-8</encoding>
>    <source>1.6</source>
>    <target>1.6</target>
>  </configuration>
> </plugin>
>
> So I want to have the source files using the UTF-8 encoding. This allow to
> have a 'portable' source code and not to have Windows specific encoding.
> But, I'm French and I want to have accents in my Type System description.
> If I enter a description in French with accent this does not work because
> it seems the JCasGen does not takes care of the Encoding specified in
> Eclipse for Java Source files.
>
> For exemple when I compiler le Java files generated with the JCasGen I have
> things like that:
>
> D:\......\IndividualSiB_Type.java:[15,15] unmappable character for encoding
> UTF-8
>
> This is because in IndivualSiB_Type those lines are generated:
> /** Correspond � un bloc image d�un item particulier du corpus BMS EDF
> Offline.
>  * Updated by JCasGen Tue Nov 10 14:54:57 CET 2009
>  * @generated */
>
> But I should have
> /** Correspond à un bloc image d'un item particulier du corpus BMS EDF
> Offline.
>
>  * Updated by JCasGen Tue Nov 10 14:54:57 CET 2009
>
>  * @generated */
>
> So I think this is a bug ?
>
> Thanks,
>
> Baptiste
>
>
> _________________________________________________________________
> Nouveau! Découvrez le Windows phone Samsung Omnia II disponible chez SFR.
> http://clk.atdmt.com/FRM/go/175819072/direct/01/