You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@jakarta.apache.org by sebb <se...@gmail.com> on 2007/08/18 14:37:43 UTC

Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ docs/site/pmc/ xdocs/stylesheets/

On 18/08/07, tetsuya@apache.org <te...@apache.org> wrote:
> Author: tetsuya
> Date: Sat Aug 18 04:14:52 2007
> New Revision: 567258
>
> URL: http://svn.apache.org/viewvc?view=rev&rev=567258
> Log:
> Apache DB (http://db.apache.org/), especially OJB and Torque had been in Jakarta - "Ex-Jakarta". - Now I am using "CHCP 1252" mode (for ISO-8859-1 characters)
>

Is there a way to fix build.xml so that the user's default encoding
does not affect the output? Or perhaps we could add a check and warn
if the encoding is wrong?

The xml source files are already flagged as ISO-8859-1, as is the
stylesheet, which uses output encoding ISO-8859-1 as well, which one
might have hoped would be enough...

As an alternative, I tried to generate the output to use &ouml;/&uuml;
etc instead of the iso-8859-1 characters, but could not work out how
to code this in the source XML without generating errors.

It might solve the problem if the XSLT output format were changed from
xml to html, but I assume there are good reasons for using xml as the
output format, and it would mean updating every page.

S///

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org


Re: svn commit: r567258

Posted by Roland Weber <os...@dubioso.net>.
The JDK version used may also have to do with it:
http://issues.apache.org/bugzilla/show_bug.cgi?id=38781

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org


Re: svn commit: r567258

Posted by Roland Weber <os...@dubioso.net>.
Hi Sebastian,

> The u-umlaut characters were replaced by ?
> 
> [But I don't know exactly how the mangled version was generated.]
> 
> The output is currently generated in iso-8859-1 (or iso-8859-15); the
> input is specified using either an actual u-umlaut, or &#252;

That's a nasty one to track down. Apart from encoding specs in
the style sheet, there's also the encoding in the <?xml?> line
of the source file to consider. The source file specifies
ISO-8859-1. I wonder whether svn might screw up the charset
on co/ci. Isn't there also a tool that does some postprocessing
in order to normalize the XML? If an XML processor generates
UTF instead of the specified ISO-8859-1, and the next processor
expects ISO-* as input, the data could get screwed up. You'd
have to chase all the chain from input to final output.

> I'll see about adding a check - should be easy enough to generate a
> dummy html file from an xml containing some accented characters and
> check that the result is as expected.

That's probably the best approach.

cheers,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org


Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ docs/site/pmc/ xdocs/stylesheets/

Posted by sebb <se...@gmail.com>.
On 19/08/07, Roland Weber <os...@dubioso.net> wrote:
> sebb wrote:
> > Is there a way to fix build.xml so that the user's default encoding
> > does not affect the output? Or perhaps we could add a check and warn
> > if the encoding is wrong?
> >
> > The xml source files are already flagged as ISO-8859-1, as is the
> > stylesheet, which uses output encoding ISO-8859-1 as well, which one
> > might have hoped would be enough...
>
> I don't know what the exact symptoms of the problem are.

Here is a sample diff:

http://svn.apache.org/viewvc/jakarta/site/docs/site/news/200206.html?r1=567256&r2=567257

The u-umlaut characters were replaced by ?

[But I don't know exactly how the mangled version was generated.]

> This is what the XSLT spec says about output encodings [1]:
>
> > The encoding attribute specifies the preferred encoding to use for
> > outputting the result tree. XSLT processors are required to respect
> > values of UTF-8 and UTF-16. For other values, if the XSLT processor
> > does not support the specified encoding it may signal an error; if
> > it does not signal an error it should use UTF-8 or UTF-16 instead.

Ah, thanks - that could well explain the problem.

> Is the output generated in UTF-8 or UTF-16? Then the solution
> would be to use one of those as the output encoding, since only
> those are required to be supported on all platforms.

The output is currently generated in iso-8859-1 (or iso-8859-15); the
input is specified using either an actual u-umlaut, or &#252;

Unfortunately changing to UTF-8 would mean changing all the html files...

I'll see about adding a check - should be easy enough to generate a
dummy html file from an xml containing some accented characters and
check that the result is as expected.

> cheers,
>  Roland
>
> [1] http://www.w3.org/TR/xslt#section-XML-Output-Method
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: general-help@jakarta.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org


Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ docs/site/pmc/ xdocs/stylesheets/

Posted by Roland Weber <os...@dubioso.net>.
sebb wrote:
> Is there a way to fix build.xml so that the user's default encoding
> does not affect the output? Or perhaps we could add a check and warn
> if the encoding is wrong?
> 
> The xml source files are already flagged as ISO-8859-1, as is the
> stylesheet, which uses output encoding ISO-8859-1 as well, which one
> might have hoped would be enough...

I don't know what the exact symptoms of the problem are.
This is what the XSLT spec says about output encodings [1]:

> The encoding attribute specifies the preferred encoding to use for
> outputting the result tree. XSLT processors are required to respect
> values of UTF-8 and UTF-16. For other values, if the XSLT processor
> does not support the specified encoding it may signal an error; if
> it does not signal an error it should use UTF-8 or UTF-16 instead.

Is the output generated in UTF-8 or UTF-16? Then the solution
would be to use one of those as the output encoding, since only
those are required to be supported on all platforms.

cheers,
  Roland

[1] http://www.w3.org/TR/xslt#section-XML-Output-Method

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org


Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ docs/site/pmc/ xdocs/stylesheets/

Posted by sebb <se...@gmail.com>.
On 18/08/07, tetsuya@apache.org <te...@apache.org> wrote:
>
> I personally think that providing "build.sh"/"build.bat"
> properly would be sufficient.

Perhaps, but I don't know what's needed to force the correct
character-set, and shell scripts are not as portable as Ant build
scripts.

So I updated the build script to do a check of the charset conversion.

Maybe the simplest solution now would be to update the error message
with details of what tweaks might be needed to get the build working
on a system which does not default to iso-8859-1?

> Kindly regards,
>
> -- Tetsuya. (tetsuya@apache.org)
>
> ----
>
> On Sat, 18 Aug 2007 13:37:43 +0100
> (Subject: Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ docs/site/pmc/ xdocs/stylesheets/)
> sebb wrote:
>
> > On 18/08/07, tetsuya@apache.org <te...@apache.org> wrote:
> > > Author: tetsuya
> > > Date: Sat Aug 18 04:14:52 2007
> > > New Revision: 567258
> > >
> > > URL: http://svn.apache.org/viewvc?view=rev&rev=567258
> > > Log:
> > > Apache DB (http://db.apache.org/), especially OJB and Torque had been in Jakarta - "Ex-Jakarta". - Now I am using "CHCP 1252" mode (for ISO-8859-1 characters)
> > >
> >
> > Is there a way to fix build.xml so that the user's default encoding
> > does not affect the output? Or perhaps we could add a check and warn
> > if the encoding is wrong?
> >
> > The xml source files are already flagged as ISO-8859-1, as is the
> > stylesheet, which uses output encoding ISO-8859-1 as well, which one
> > might have hoped would be enough...
> >
> > As an alternative, I tried to generate the output to use &ouml;/&uuml;
> > etc instead of the iso-8859-1 characters, but could not work out how
> > to code this in the source XML without generating errors.
> >
> > It might solve the problem if the XSLT output format were changed from
> > xml to html, but I assume there are good reasons for using xml as the
> > output format, and it would mean updating every page.
> >
> > S///
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: site-cvs-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: site-cvs-help@jakarta.apache.org
> >
> ---------------------------------------------------------------------
> Tetsuya Kitahata --  Terra-International, Inc. - President -
> E-mail: kitahata@terra-intl.com         http://www.terra-intl.com/
> Apache News Online                      http://www.apachenews.org/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: general-help@jakarta.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org


Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ docs/site/pmc/ xdocs/stylesheets/

Posted by te...@apache.org.
I personally think that providing "build.sh"/"build.bat"
properly would be sufficient.

Kindly regards,

-- Tetsuya. (tetsuya@apache.org)

----

On Sat, 18 Aug 2007 13:37:43 +0100
(Subject: Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ docs/site/pmc/ xdocs/stylesheets/)
sebb wrote:

> On 18/08/07, tetsuya@apache.org <te...@apache.org> wrote:
> > Author: tetsuya
> > Date: Sat Aug 18 04:14:52 2007
> > New Revision: 567258
> >
> > URL: http://svn.apache.org/viewvc?view=rev&rev=567258
> > Log:
> > Apache DB (http://db.apache.org/), especially OJB and Torque had been in Jakarta - "Ex-Jakarta". - Now I am using "CHCP 1252" mode (for ISO-8859-1 characters)
> >
> 
> Is there a way to fix build.xml so that the user's default encoding
> does not affect the output? Or perhaps we could add a check and warn
> if the encoding is wrong?
> 
> The xml source files are already flagged as ISO-8859-1, as is the
> stylesheet, which uses output encoding ISO-8859-1 as well, which one
> might have hoped would be enough...
> 
> As an alternative, I tried to generate the output to use &ouml;/&uuml;
> etc instead of the iso-8859-1 characters, but could not work out how
> to code this in the source XML without generating errors.
> 
> It might solve the problem if the XSLT output format were changed from
> xml to html, but I assume there are good reasons for using xml as the
> output format, and it would mean updating every page.
> 
> S///
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: site-cvs-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: site-cvs-help@jakarta.apache.org
> 
---------------------------------------------------------------------
Tetsuya Kitahata --  Terra-International, Inc. - President - 
E-mail: kitahata@terra-intl.com 	http://www.terra-intl.com/
Apache News Online			http://www.apachenews.org/


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org