You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomee.apache.org by Mohammad Nour El-Din <mn...@apache.org> on 2007/12/31 20:56:44 UTC

Source code files character encoding problems

Hi Folks...

  I've just checked out the latest code, used the eclipse mavne plugin to
import the src code as Eclipse projects, surprised that most some projects
are not built successfully because of some java files which could not be
read in UTF-8 character encoding mode, I solved the problem by opening and
Saving as the files in the UTF-8 character encoding, but I need some
explanation here y that happened and why not with all files ???

-- 
Thanks
- Mohammad Nour

Re: Source code files character encoding problems

Posted by David Blevins <da...@visi.com>.
On Jan 2, 2008, at 8:16 AM, Mohammad Nour El-Din wrote:

> Yeah but this happens to the Java source files itself not XML  
> files !!!

Right, the issue is the same.

-David


>
>
> On Jan 1, 2008 1:19 AM, David Blevins <da...@visi.com> wrote:
>
>>
>> On Dec 31, 2007, at 11:56 AM, Mohammad Nour El-Din wrote:
>>
>>> Hi Folks...
>>>
>>> I've just checked out the latest code, used the eclipse mavne
>>> plugin to
>>> import the src code as Eclipse projects, surprised that most some
>>> projects
>>> are not built successfully because of some java files which could
>>> not be
>>> read in UTF-8 character encoding mode, I solved the problem by
>>> opening and
>>> Saving as the files in the UTF-8 character encoding, but I need some
>>> explanation here y that happened and why not with all files ???
>>
>> Someone complained about this recently as well.
>>
>> http://www.nabble.com/RE%3A-newbie-questions-p14294655.html
>>
>> From the description in that email it seems like setting the
>> "svn:keywords" on the files has lead to encoding issues.
>>
>> I don't know if there's a way to keep the svn:keywords and avoid the
>> encoding issues.
>>
>> -David
>>
>>
>
>
> -- 
> Thanks
> - Mohammad Nour


Re: Source code files character encoding problems

Posted by Mohammad Nour El-Din <no...@gmail.com>.
Yeah but this happens to the Java source files itself not XML files !!!

On Jan 1, 2008 1:19 AM, David Blevins <da...@visi.com> wrote:

>
> On Dec 31, 2007, at 11:56 AM, Mohammad Nour El-Din wrote:
>
> > Hi Folks...
> >
> >  I've just checked out the latest code, used the eclipse mavne
> > plugin to
> > import the src code as Eclipse projects, surprised that most some
> > projects
> > are not built successfully because of some java files which could
> > not be
> > read in UTF-8 character encoding mode, I solved the problem by
> > opening and
> > Saving as the files in the UTF-8 character encoding, but I need some
> > explanation here y that happened and why not with all files ???
>
> Someone complained about this recently as well.
>
> http://www.nabble.com/RE%3A-newbie-questions-p14294655.html
>
>  From the description in that email it seems like setting the
> "svn:keywords" on the files has lead to encoding issues.
>
> I don't know if there's a way to keep the svn:keywords and avoid the
> encoding issues.
>
> -David
>
>


-- 
Thanks
- Mohammad Nour

Re: Source code files character encoding problems

Posted by "Daniel S. Haischt" <da...@googlemail.com>.
David Blevins wrote:
>> As a side note: encoding="UTF-8" doesn't mean that the file is UTF-8
>> encoded per se (i.e. the user explicitly specified in notepad to save
>> the file as UTF-8 ensures that the file is being saved as UTF-8 encoded
>> to the file system). That's something I tried to point out earlier.
> 
> Yes, that's exactly what I said.  I just don't think it's Eclipse doing 
> it which is what I tried to point out.  Seems more likely that it's his 
> svn client.
> 

If he is on windoze and this issue poped up on XML files it could be a
windoze issue as well cause a XML file containing encoding="UTF-8",
needs to have the hex characters EF BB BF at the very beginning of the
file.

Note: EF BB BF won't be visible if using notepad or another text editor.
You need to open a file in hex mode to see these kind of characters.

I know this sounds strange but it's real. I gone through this already :)

Cheers
Daniel

Re: Source code files character encoding problems

Posted by David Blevins <da...@visi.com>.
On Jan 5, 2008, at 2:26 PM, Daniel S. Haischt wrote:

> wouldn't it be possible to just remove the keyword temporarily and let
> Mohammad re-try whether the issue still persists? That way you would  
> be
> more or less certain that it's related to the SVN keyword and thus
> removing them is a legit approach.

Mohammad, can you give that a try?  You should be able to just delete  
the svn keywords section on your local copy and build.

> As a side note: encoding="UTF-8" doesn't mean that the file is UTF-8
> encoded per se (i.e. the user explicitly specified in notepad to save
> the file as UTF-8 ensures that the file is being saved as UTF-8  
> encoded
> to the file system). That's something I tried to point out earlier.

Yes, that's exactly what I said.  I just don't think it's Eclipse  
doing it which is what I tried to point out.  Seems more likely that  
it's his svn client.

-David

> David Blevins wrote:
>> On Jan 5, 2008, at 8:14 AM, Jacek Laskowski wrote:
>>> On Jan 5, 2008 1:20 PM, Daniel S. Haischt <daniel.haischt@googlemail.com 
>>> > wrote:
>>>> Again I think this is a pure Eclipse issue if not reported  
>>>> different
>>>> by someone else who tries to open the files in IntelliJ for example
>>>> and experiences the same exact issue.
>>>
>>> I'm working with Eclipse, NetBeans and IntelliJ recently and none's
>>> reported troubles opening the files so I think you're right. Thanks
>>> Daniel for the report as I wouldn't figure that out myself.
>> I don't think it's it.  The user complained that his *maven* build  
>> failed and that he had non-UTF8 characters in the part where svn  
>> substitutes the keywords.  The xml files clearly state  
>> encoding="UTF-8" and his svn client is adding non-UTF8 characters  
>> when it edits the file while adding the keywords making it  
>> unparsable, I'd guess, by any valid xml parser who listens to the  
>> 'encoding' attribute.
>> Regardless, even if we can explain it that doesn't make it gone, it  
>> still needs to be fixed.  The only fix I see is to yank the  
>> keywords if svn is going mix character encodings on us.  Could be  
>> just TortiseSVN, but still.
>> Any other thoughts or proposed solutions?
>> -David
>
>


Re: Source code files character encoding problems

Posted by "Daniel S. Haischt" <da...@googlemail.com>.
wouldn't it be possible to just remove the keyword temporarily and let
Mohammad re-try whether the issue still persists? That way you would be
more or less certain that it's related to the SVN keyword and thus
removing them is a legit approach.

As a side note: encoding="UTF-8" doesn't mean that the file is UTF-8
encoded per se (i.e. the user explicitly specified in notepad to save
the file as UTF-8 ensures that the file is being saved as UTF-8 encoded
to the file system). That's something I tried to point out earlier.


David Blevins wrote:
> 
> On Jan 5, 2008, at 8:14 AM, Jacek Laskowski wrote:
> 
>> On Jan 5, 2008 1:20 PM, Daniel S. Haischt 
>> <da...@googlemail.com> wrote:
>>> Again I think this is a pure Eclipse issue if not reported different
>>> by someone else who tries to open the files in IntelliJ for example
>>> and experiences the same exact issue.
>>
>> I'm working with Eclipse, NetBeans and IntelliJ recently and none's
>> reported troubles opening the files so I think you're right. Thanks
>> Daniel for the report as I wouldn't figure that out myself.
> 
> I don't think it's it.  The user complained that his *maven* build 
> failed and that he had non-UTF8 characters in the part where svn 
> substitutes the keywords.  The xml files clearly state encoding="UTF-8" 
> and his svn client is adding non-UTF8 characters when it edits the file 
> while adding the keywords making it unparsable, I'd guess, by any valid 
> xml parser who listens to the 'encoding' attribute.
> 
> Regardless, even if we can explain it that doesn't make it gone, it 
> still needs to be fixed.  The only fix I see is to yank the keywords if 
> svn is going mix character encodings on us.  Could be just TortiseSVN, 
> but still.
> 
> Any other thoughts or proposed solutions?
> 
> -David
> 


Re: Source code files character encoding problems

Posted by Jacek Laskowski <ja...@laskowski.net.pl>.
On Jan 5, 2008 10:59 PM, David Blevins <da...@visi.com> wrote:

> I don't think it's it.  The user complained that his *maven* build
> failed and that he had non-UTF8 characters in the part where svn
> substitutes the keywords.  The xml files clearly state
> encoding="UTF-8" and his svn client is adding non-UTF8 characters when
> it edits the file while adding the keywords making it unparsable, I'd
> guess, by any valid xml parser who listens to the 'encoding' attribute.

Yes, you're right. I remember it was with bg_BG locale when an xml
file with UTF-8 contained some unparsable letters. How could it be
that svn changes the keywords that other projects don't sufer from it
too?

Jacek

-- 
Jacek Laskowski
http://www.JacekLaskowski.pl

Re: Source code files character encoding problems

Posted by David Blevins <da...@visi.com>.
On Jan 5, 2008, at 8:14 AM, Jacek Laskowski wrote:

> On Jan 5, 2008 1:20 PM, Daniel S. Haischt <daniel.haischt@googlemail.com 
> > wrote:
>> Again I think this is a pure Eclipse issue if not reported different
>> by someone else who tries to open the files in IntelliJ for example
>> and experiences the same exact issue.
>
> I'm working with Eclipse, NetBeans and IntelliJ recently and none's
> reported troubles opening the files so I think you're right. Thanks
> Daniel for the report as I wouldn't figure that out myself.

I don't think it's it.  The user complained that his *maven* build  
failed and that he had non-UTF8 characters in the part where svn  
substitutes the keywords.  The xml files clearly state  
encoding="UTF-8" and his svn client is adding non-UTF8 characters when  
it edits the file while adding the keywords making it unparsable, I'd  
guess, by any valid xml parser who listens to the 'encoding' attribute.

Regardless, even if we can explain it that doesn't make it gone, it  
still needs to be fixed.  The only fix I see is to yank the keywords  
if svn is going mix character encodings on us.  Could be just  
TortiseSVN, but still.

Any other thoughts or proposed solutions?

-David


Re: Source code files character encoding problems

Posted by Jacek Laskowski <ja...@laskowski.net.pl>.
On Jan 5, 2008 1:20 PM, Daniel S. Haischt <da...@googlemail.com> wrote:
> Again I think this is a pure Eclipse issue if not reported different
> by someone else who tries to open the files in IntelliJ for example
> and experiences the same exact issue.

I'm working with Eclipse, NetBeans and IntelliJ recently and none's
reported troubles opening the files so I think you're right. Thanks
Daniel for the report as I wouldn't figure that out myself.

Jacek

-- 
Jacek Laskowski
http://www.JacekLaskowski.pl

Re: Source code files character encoding problems

Posted by "Daniel S. Haischt" <da...@googlemail.com>.
Again I think this is a pure Eclipse issue if not reported different
by someone else who tries to open the files in IntelliJ for example
and experiences the same exact issue.

See:

http://www.ryanlowe.ca/blog/archives/001328_default_file_encoding_issues.php

He had the same issue - which is files checked into the SCM are NOT
UTF-8 encoded but on Linux the default encoding for Eclipse is UTF-8
and thus Eclipse tries to open them as UTF-8 encoded files.

This yields a -> Syntax error on token "Invalid Character"

If I interpreter Mohammad correct, he tries to execute javac from within
Eclipse and I think cause Eclipse uses the UTF-8 encoding on Linux, the
encoding being used by javac is forced to UTF-8 as well. Thus the error.

Thoughts ?

David Blevins wrote:
> 
> On Jan 2, 2008, at 12:04 PM, Dain Sundstrom wrote:
> 
>> On Dec 31, 2007, at 3:19 PM, David Blevins wrote:
>>
>>> On Dec 31, 2007, at 11:56 AM, Mohammad Nour El-Din wrote:
>>>
>>>> Hi Folks...
>>>>
>>>> I've just checked out the latest code, used the eclipse mavne plugin to
>>>> import the src code as Eclipse projects, surprised that most some 
>>>> projects
>>>> are not built successfully because of some java files which could 
>>>> not be
>>>> read in UTF-8 character encoding mode, I solved the problem by 
>>>> opening and
>>>> Saving as the files in the UTF-8 character encoding, but I need some
>>>> explanation here y that happened and why not with all files ???
>>>
>>> Someone complained about this recently as well.
>>>
>>> http://www.nabble.com/RE%3A-newbie-questions-p14294655.html
>>>
>>> From the description in that email it seems like setting the 
>>> "svn:keywords" on the files has lead to encoding issues.
>>>
>>> I don't know if there's a way to keep the svn:keywords and avoid the 
>>> encoding issues.
>>
>> I'm not a big fan of the keywords anyway, so if you want to drop them, 
>> I'm for it.
> 
> I'd prefer to give Jacek a chance to resolve the encoding issue before 
> we yank them.  Jacek, you have any idea what's going on?
> 
> -David
> 


Re: Source code files character encoding problems

Posted by David Blevins <da...@visi.com>.
On Jan 2, 2008, at 12:04 PM, Dain Sundstrom wrote:

> On Dec 31, 2007, at 3:19 PM, David Blevins wrote:
>
>> On Dec 31, 2007, at 11:56 AM, Mohammad Nour El-Din wrote:
>>
>>> Hi Folks...
>>>
>>> I've just checked out the latest code, used the eclipse mavne  
>>> plugin to
>>> import the src code as Eclipse projects, surprised that most some  
>>> projects
>>> are not built successfully because of some java files which could  
>>> not be
>>> read in UTF-8 character encoding mode, I solved the problem by  
>>> opening and
>>> Saving as the files in the UTF-8 character encoding, but I need some
>>> explanation here y that happened and why not with all files ???
>>
>> Someone complained about this recently as well.
>>
>> http://www.nabble.com/RE%3A-newbie-questions-p14294655.html
>>
>> From the description in that email it seems like setting the  
>> "svn:keywords" on the files has lead to encoding issues.
>>
>> I don't know if there's a way to keep the svn:keywords and avoid  
>> the encoding issues.
>
> I'm not a big fan of the keywords anyway, so if you want to drop  
> them, I'm for it.

I'd prefer to give Jacek a chance to resolve the encoding issue before  
we yank them.  Jacek, you have any idea what's going on?

-David


Re: Source code files character encoding problems

Posted by Dain Sundstrom <da...@iq80.com>.
On Dec 31, 2007, at 3:19 PM, David Blevins wrote:

> On Dec 31, 2007, at 11:56 AM, Mohammad Nour El-Din wrote:
>
>> Hi Folks...
>>
>>  I've just checked out the latest code, used the eclipse mavne  
>> plugin to
>> import the src code as Eclipse projects, surprised that most some  
>> projects
>> are not built successfully because of some java files which could  
>> not be
>> read in UTF-8 character encoding mode, I solved the problem by  
>> opening and
>> Saving as the files in the UTF-8 character encoding, but I need some
>> explanation here y that happened and why not with all files ???
>
> Someone complained about this recently as well.
>
> http://www.nabble.com/RE%3A-newbie-questions-p14294655.html
>
> From the description in that email it seems like setting the  
> "svn:keywords" on the files has lead to encoding issues.
>
> I don't know if there's a way to keep the svn:keywords and avoid  
> the encoding issues.

I'm not a big fan of the keywords anyway, so if you want to drop  
them, I'm for it.

-dain

Re: Source code files character encoding problems

Posted by David Blevins <da...@visi.com>.
On Dec 31, 2007, at 11:56 AM, Mohammad Nour El-Din wrote:

> Hi Folks...
>
>  I've just checked out the latest code, used the eclipse mavne  
> plugin to
> import the src code as Eclipse projects, surprised that most some  
> projects
> are not built successfully because of some java files which could  
> not be
> read in UTF-8 character encoding mode, I solved the problem by  
> opening and
> Saving as the files in the UTF-8 character encoding, but I need some
> explanation here y that happened and why not with all files ???

Someone complained about this recently as well.

http://www.nabble.com/RE%3A-newbie-questions-p14294655.html

 From the description in that email it seems like setting the  
"svn:keywords" on the files has lead to encoding issues.

I don't know if there's a way to keep the svn:keywords and avoid the  
encoding issues.

-David