You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@forrest.apache.org by Joao Miguel Ferreira <jm...@estg.ipvc.pt> on 2004/10/01 00:21:58 UTC

non english languages (was: Simple question from a newbie)

Hello all,
Hello thorsten,

Now I know what was wrong: it seems that forrest "doesn't like
portuguese"....

If I write bad portuguese (without á, é, à, ó, ê, etc) I get "BUIL
SUCCESSFULL".

If I write good portuguese (with á, é, à, ó, ê, etc) I get "BUIL FAILED"
on the file containing those chars.

Sorry about the confusion... I'm just starting.

So, next question: how do I tell forrest that my site is written in
portuguese ?

And: how can I get more verbosity form "forrest validate" ?

thank you

jmf




> > At the end of section "3 - Seeding a new Project" the document tells me
> > to copy "index.xml" to "my-new-file.xml" and so I did. I added some text
> > in the correct places (so I think !!!)
> > 
> 
> Nupp, you may have an open tag somewhere, or used tags that are not 
> allowed by the dtd.
> 
> > BUILD FAILED
> > /home/jmf/forrest/apache-forrest-0.5.1-bin/forrest.build.xml:851: Could
> > not validate document
> > /home/jmf/forrest_work/teste1/src/documentation/content/xdocs/my-new-file.xml
> > 



Re: non english languages (was: Simple question from a newbie)

Posted by Mariano Goldman <ma...@gmail.com>.
I write xdocs in Spanish, which also has special chars (such as á, é,
ñ, etc). Then I need to use Latin-1 charset. Then change the xml
declaration (1st line of the xml file) to
<?xml version="1.0" encoding="ISO-8859-1"?>
That should work.

Regards,

Mariano


On 01 Oct 2004 12:13:16 +1000, David Crossley <cr...@apache.org> wrote:
> David Crossley wrote:
> > Joao Miguel Ferreira wrote:
> > >
> > > Now I know what was wrong: it seems that forrest "doesn't like
> > > portuguese"....
> >
> > What is the xml encoding at the first line of your
> > xml source files?
> 
> Actually the best thing would be to show us.
> Please create an entry at the Forrest issue tracker.
> http://issues.cocoondev.org/secure/BrowseProject.jspa?id=10000
> 
> Do 'forrest seed', then add a tiny bit of text to
> the index.xml and do 'forrest' to be sure that the
> issue is still there. Then attach that index.xml to
> the issue tracker. That avoids any issues with our
> mail agents interpreting your "foreign" characters.
> 
> --
> David Crossley
> 
>

Re: non english languages

Posted by Joao Miguel Ferreira <jm...@estg.ipvc.pt>.
Hello all,

Sorry about my long silence. I've been out for some days.

I was the one starting this thread about "non english languages".

I finally tested the proposed solutions and it work's: I replaced
"UTF-8" with "ISO-8859-1" and everything works fine.

Thank you all, once again.

João Ferreira




Re: non english languages

Posted by Johannes Schaefer <jo...@uidesign.de>.
David Crossley wrote:

>Johannes Schaefer wrote:
><snip/>
>  
>
>>I believe this depends on which language you're using,
>>so it might be different for German/Spanish/Latin and
>>Greek or Russian ... UTF-8 might work for all but it
>>may be difficult to read (let alone type!).
>>
>>OK. I'm not an expert here. I Digged into the web and
>>found that UTF-8 is a special encoding of UNICODE.
>>
>>The chars for Unicode may be found here:
>>   http://www.unicode.org/charts/
>>and a FAQ here:  http://www.unicode.org/faq/utf_bom.html
>>
>>Other sources about charachter encoding (for HTML)include e.g.
>>   http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist.html
>>   http://www.cs.tut.fi/~jkorpela/html/chars.html
>>
>>So, may be someone with a profound understanding may
>>give us better advice on which encoding to use ...
>>    
>>
>
>Interesting synchronicity on Cocoon Users, perhaps related:
>Subject: XHTML and Entities: ' fails in IE
>http://mail-archives.apache.org/eyebrowse/ReadMsg?listId=57&msgNo=38770
>
>  
>
It looks like there is something similar already in the FAQ:


        2.12. How to use special characters in the labels of the
        site.xml file?

The answer there is to use &#xxx; encoding.

js

-- 
User Interface Design GmbH * Teinacher Str. 38 * D-71634 Ludwigsburg
Fon +49 (0)7141 377 000 * Fax  +49 (0)7141 377 00-99
Geschäftsstelle: User Interface Design GmbH * Lehrer-Götz-Weg 11 * D-81825 München
www.uidesign.de

Buch "User Interface Tuning" von Joachim Machate & Michael Burmester 
www.user-interface-tuning.de

Attraktivität von interaktiven Produkten messen mit 
www.attrakdiff.de


Re: non english languages

Posted by David Crossley <cr...@apache.org>.
Johannes Schaefer wrote:
<snip/>
> 
> I believe this depends on which language you're using,
> so it might be different for German/Spanish/Latin and
> Greek or Russian ... UTF-8 might work for all but it
> may be difficult to read (let alone type!).
> 
> OK. I'm not an expert here. I Digged into the web and
> found that UTF-8 is a special encoding of UNICODE.
> 
> The chars for Unicode may be found here:
>    http://www.unicode.org/charts/
> and a FAQ here:  http://www.unicode.org/faq/utf_bom.html
> 
> Other sources about charachter encoding (for HTML)include e.g.
>    http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist.html
>    http://www.cs.tut.fi/~jkorpela/html/chars.html
> 
> So, may be someone with a profound understanding may
> give us better advice on which encoding to use ...

Interesting synchronicity on Cocoon Users, perhaps related:
Subject: XHTML and Entities: ' fails in IE
http://mail-archives.apache.org/eyebrowse/ReadMsg?listId=57&msgNo=38770

-- 
David Crossley


Re: non english languages

Posted by David Crossley <cr...@apache.org>.
Johannes Schaefer wrote:
> 
> OK. I'm not an expert here. I Digged into the web and
> found that UTF-8 is a special encoding of UNICODE.
> 
> The chars for Unicode may be found here:
>    http://www.unicode.org/charts/
> and a FAQ here:  http://www.unicode.org/faq/utf_bom.html
> 
> Other sources about charachter encoding (for HTML)include e.g.
>    http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist.html
>    http://www.cs.tut.fi/~jkorpela/html/chars.html
> 
> So, may be someone with a profound understanding may
> give us better advice on which encoding to use ...

While skimming the blogs of people from the recent
Cocoon GetTogether i found that there was a topic about
this issue. Bertrand summarised the talk ...
http://codeconsult.ch/bertrand/archives/000391.html
<quote>
Lots of interesting points on how to handle this properly, including
* Being careful about HTTP headers matching charset definitions in html
elements.
* Using UTF-8 encoding as the only one that is found everywhere.
* Alan Wood's Unicode resources. http://www.alanwood.net/unicode/
</quote>

-- 
David Crossley


Re: non english languages

Posted by Johannes Schaefer <jo...@uidesign.de>.
David Crossley wrote:

> Johannes Schaefer wrote:
> 
>>Johannes Schaefer wrote:
>>
>>>We had this problem, too, with German Umlauts: äöüß.
>>>Solution was to put the encoding in:
>>><?xml version="1.0" encoding="UTF-8"?>
>>>  gives the special characters but they are ugly two byte
>>>  sequences
>>><?xml version="1.0" encoding="iso-8859-1"?>
>>>  shows the special characters nicely in the source
>>>I don't know which one is better from a technical/forrest
>>>point of view.
>>>
>>>Boa sorte com isso!
>>>Johannes
>>
>>Afterthought: this might be worth mentioning in the FAQ.
> 
> 
> Yes. Does someone have an authoritative URL that we can
> refer to about this common topic?
> 
> As to Forrest's requirements, it will process documents
> that are standard XML instances with the proper encoding.
> So do it whichever way, as long as it is legitimate xml.
> 

I believe this depends on which language you're using,
so it might be different for German/Spanish/Latin and
Greek or Russian ... UTF-8 might work for all but it
may be difficult to read (let alone type!).

OK. I'm not an expert here. I Digged into the web and
found that UTF-8 is a special encoding of UNICODE.

The chars for Unicode may be found here:
   http://www.unicode.org/charts/
and a FAQ here:  http://www.unicode.org/faq/utf_bom.html

Other sources about charachter encoding (for HTML)include e.g.
   http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist.html
   http://www.cs.tut.fi/~jkorpela/html/chars.html

So, may be someone with a profound understanding may
give us better advice on which encoding to use ...

Johannes

-- 
User Interface Design GmbH * Teinacher Str. 38 * D-71634 Ludwigsburg
Fon +49 (0)7141 377 000 * Fax  +49 (0)7141 377 00-99
Geschäftsstelle: User Interface Design GmbH * Lehrer-Götz-Weg 11 * 
D-81825 München
www.uidesign.de

Buch "User Interface Tuning" von Joachim Machate & Michael Burmester
www.user-interface-tuning.de

Attraktivität von interaktiven Produkten messen mit
www.attrakdiff.de

Re: non english languages

Posted by David Crossley <cr...@apache.org>.
Johannes Schaefer wrote:
> Johannes Schaefer wrote:
> > 
> > We had this problem, too, with German Umlauts: äöüß.
> > Solution was to put the encoding in:
> > <?xml version="1.0" encoding="UTF-8"?>
> >   gives the special characters but they are ugly two byte
> >   sequences
> > <?xml version="1.0" encoding="iso-8859-1"?>
> >   shows the special characters nicely in the source
> > I don't know which one is better from a technical/forrest
> > point of view.
> > 
> > Boa sorte com isso!
> > Johannes
> 
> Afterthought: this might be worth mentioning in the FAQ.

Yes. Does someone have an authoritative URL that we can
refer to about this common topic?

As to Forrest's requirements, it will process documents
that are standard XML instances with the proper encoding.
So do it whichever way, as long as it is legitimate xml.

-- 
David Crossley


Re: non english languages

Posted by Johannes Schaefer <jo...@uidesign.de>.
Johannes Schaefer wrote:

> David Crossley wrote:
> 
>> David Crossley wrote:
>>
>>> Joao Miguel Ferreira wrote:
>>>
>>>> Now I know what was wrong: it seems that forrest "doesn't like
>>>> portuguese"....
>>>
>>>
>>> What is the xml encoding at the first line of your
>>> xml source files?
>>
>>
>>
>> Actually the best thing would be to show us.
>> Please create an entry at the Forrest issue tracker.
>> http://issues.cocoondev.org/secure/BrowseProject.jspa?id=10000
>>
>> Do 'forrest seed', then add a tiny bit of text to
>> the index.xml and do 'forrest' to be sure that the
>> issue is still there. Then attach that index.xml to
>> the issue tracker. That avoids any issues with our
>> mail agents interpreting your "foreign" characters.
>>
> 
> 
> We had this problem, too, with German Umlauts: äöüß.
> Solution was to put the encoding in:
> <?xml version="1.0" encoding="UTF-8"?>
>   gives the special characters but they are ugly two byte
>   sequences
> <?xml version="1.0" encoding="iso-8859-1"?>
>   shows the special characters nicely in the source
> I don't know which one is better from a technical/forrest
> point of view.
> 
> Boa sorte com isso!
> Johannes
> 
> 

Afterthought: this might be worth mentioning in the FAQ.
js



-- 
User Interface Design GmbH * Teinacher Str. 38 * D-71634 Ludwigsburg
Fon +49 (0)7141 377 000 * Fax  +49 (0)7141 377 00-99
Geschäftsstelle: User Interface Design GmbH * Lehrer-Götz-Weg 11 * 
D-81825 München
www.uidesign.de

Re: non english languages

Posted by Johannes Schaefer <jo...@uidesign.de>.
David Crossley wrote:

> David Crossley wrote:
> 
>>Joao Miguel Ferreira wrote:
>>
>>>Now I know what was wrong: it seems that forrest "doesn't like
>>>portuguese"....
>>
>>What is the xml encoding at the first line of your
>>xml source files?
> 
> 
> Actually the best thing would be to show us.
> Please create an entry at the Forrest issue tracker.
> http://issues.cocoondev.org/secure/BrowseProject.jspa?id=10000
> 
> Do 'forrest seed', then add a tiny bit of text to
> the index.xml and do 'forrest' to be sure that the
> issue is still there. Then attach that index.xml to
> the issue tracker. That avoids any issues with our
> mail agents interpreting your "foreign" characters.
> 


We had this problem, too, with German Umlauts: äöüß.
Solution was to put the encoding in:
<?xml version="1.0" encoding="UTF-8"?>
   gives the special characters but they are ugly two byte
   sequences
<?xml version="1.0" encoding="iso-8859-1"?>
   shows the special characters nicely in the source
I don't know which one is better from a technical/forrest
point of view.

Boa sorte com isso!
Johannes


-- 
User Interface Design GmbH * Teinacher Str. 38 * D-71634 Ludwigsburg
Fon +49 (0)7141 377 000 * Fax  +49 (0)7141 377 00-99
Geschäftsstelle: User Interface Design GmbH * Lehrer-Götz-Weg 11 * 
D-81825 München
www.uidesign.de

Re: non english languages (was: Simple question from a newbie)

Posted by David Crossley <cr...@apache.org>.
David Crossley wrote:
> Joao Miguel Ferreira wrote:
> > 
> > Now I know what was wrong: it seems that forrest "doesn't like
> > portuguese"....
> 
> What is the xml encoding at the first line of your
> xml source files?

Actually the best thing would be to show us.
Please create an entry at the Forrest issue tracker.
http://issues.cocoondev.org/secure/BrowseProject.jspa?id=10000

Do 'forrest seed', then add a tiny bit of text to
the index.xml and do 'forrest' to be sure that the
issue is still there. Then attach that index.xml to
the issue tracker. That avoids any issues with our
mail agents interpreting your "foreign" characters.

-- 
David Crossley


Re: non english languages (was: Simple question from a newbie)

Posted by David Crossley <cr...@apache.org>.
Joao Miguel Ferreira wrote:
> Hello all,
> Hello thorsten,
> 
> Now I know what was wrong: it seems that forrest "doesn't like
> portuguese"....
> 
> If I write bad portuguese (without á, é, à, ó, ê, etc) I get "BUIL
> SUCCESSFULL".
> 
> If I write good portuguese (with á, é, à, ó, ê, etc) I get "BUIL FAILED"
> on the file containing those chars.
> 
> Sorry about the confusion... I'm just starting.
> 
> So, next question: how do I tell forrest that my site is written in
> portuguese ?
>
<snipped another unrelated question/>

What is the xml encoding at the first line of your
xml source files?

-- 
David Crossley