You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Igor Hersht <ig...@ca.ibm.com> on 2003/11/11 18:07:16 UTC
Re:Internalization A package for code common for XSLTC, xalan interpretive
and XQuery.
I looks like I had problems sending the note with an attachment
I think we have 2 issues here. Packaging and internalization (in particular
collation) specs.
I would go with org.apache.xalan.common.internalization for the common
code.
Internalization (in particular collation) specs have a lot of
implementation
defined freedom. I understood that the specs just cannot be that specific
because of dependences on underline implementations. After a lot of
discussions we came with a specific spec which I think we should
implement for all (xalan interpretive, XSLTC, xalan C++) our processors.
(Actually I wrote a draft implementation of thespecs for XSLTC). The specs
could be changed in future if XSLT 2.0 would
became less ambiguous (e.g. with respect to default collation. ).
I also think that not only collation, but also lang attribute could be more
specific and has common rules for different xsl elements.
Actually our specs are in a draft form and obviously not perfect.
Discussions would be appreciated.
_______________________________________________________________________
3.3 xalan, xsltc, xalanC collation specifications
3.3.1 xsl-collation:collation element
(xmlns:xsl-collation="http://xml.apache.org/xalan/collation" ).
<!-- Category: declaration -->
<xsl-collation:collation
name = uri-reference
lang = nmtoken
decomposition = "no" | "canonical" | "full"
strength = "primary" | "secondary" | "tertiary" | "identical"
rules = string
default = "yes" | "no"
/ >
The xsl-collation:collation element is a top-level element. It is a
collation declaration and used to define collating sequences. The collation
ID is a URI and it is defined in the mandatory attribute name. Both an
absolute and a relative URI can be used similarly (e.g. a relative URI
serves just as a collation identifier and therefore should not be resolved
with respect to any base). The processor is not required to validate that
the name is a valid URI . (The behavior is undefined if the name is not a
valid URI). An error must be issued if the name value:
equals to an empty string or equals to Unicode codepoint collation URI
http://www.w3.org/2003/05/xpath-functions/collation/codepoint").
The other attributes are optional:
* lang: follows the rules of the xml:lang attribute. Some clarifications
of the rules will apply.
Not specifying lang means a default language. It is an error if lang is
invalid according to IETF RFC 1766 specifications (see also
http://www.w3.org/XML/xml-19980210-errata).
If lang is specified, it is used to construct the ISO-639 language code,
the ISO-3166 country code and a variant code.
Mapping from the lang attribute to the ISO-639 language and the ISO-3166
country code described in
http://www.w3.org/TR/1998/REC-xml-19980210#sec-lang-tag.
The ISO-639 language is default if the code has not been specified .
ISO-639 language is also default if an implementation cannot find a
resources for constructed ISO-639 language. A warning message should be
issued in this case.
The country is default the code has not been specified. The country is also
default with given ISO-639 language if implementation cannot find a
resources for constructed ISO-3166 country code. A warning message should
be issued in this case.
A variant code is considered to be implementation defined. We construct
the code from the substring after the second tag separator by converting it
to upper case and replacing
all ?-? characters with ?_?. The variant code is ignored, if
implementation cannot find a resources for the variant code with given
ISO-639 language and ISO-3166 country code. A warning message should be
issued in this case. Changes in behavior caused by the variant are
implementation defined.
* decomposition: Determines how the collator handles Unicode composed
characters.
(See the JDK 1.2 documentation for details). Not specifying a
decomposition means a decomposition default for specified lang attribute.
* strength: sets the strength of the collator. (See the JDK 1.2
documentation for details).
Not specifying strength means a strength default for specified lang
attribute.
* rules: Sets the rules to be used by a RuleBasedCollator. The rules
should be used as
a ?modifier? for the given rules. (See the JDK 1.2 documentation for
details. e.g.
if "a < b < c < d" according to the original rules, and rules ="b < a",
then the modified rules
are "b < a < c < d" ). It is an error if rules value equals to an empty
string.
* default: The value "yes" indicates that this collation is to be used as
the default collation.
Not specifying default means ?no?.
A collation element attribute should be ignored if an implementation cannot
process it and it is not specifed above as an error. A warning message
should be issued in this case. No attribute, other than specified above, is
legal. A error message should be issued in this case.
3.3.2 Usage in XQuery and XPath Functions and in xsl elements
Collation may be used in XQuery and XPath Functions (see
http://www.w3.org/TR/xquery-operators) fn:compare, fn:starts-with,
fn:ends-with, fn:contains, fn:substring-before, fn:substring-after,
fn:index-of, fn:distinct-values, fn:deep-equal, fn:max, fn:min,
fn:default-collation.
Usage is specified in http://www.w3.org/TR/xquery-operators/#charmod
Collation also can be used in xsl elements: xsl:for-each-group, xsl:key and
xsl:sort.
xsl:sort element has case-order and lang attributes and because of it has
some additional rules which are specified in http://www.w3.org/TR/xslt20
(13.2 The xsl:sort Element).
3.3.2.1 Collation name resolution
First, if a stylesheet contains declarations of two or more collation
elements have the same name, the one with the highest import precedence is
used and the elements with lower import precedence are eliminated from
consideration . It is an error for a stylesheet to contain two or more
collation elements with the same name (taking into account that the
elements with lower import precedence have been eliminated from
consideration).
If no default collation was specified the Unicode codepoint collation is
used as a default collation. If two or more collation elements were
specified as default collation, the one with the highest import precedence
is used. It is an error for a stylesheet to have two or more collation
elements specified as default, if they have the same import precedence,
unless there is another collation element which was specified as default
and has a higher import precedence.
It is an "Unsupported collation" error if a collation name was explicitly
referenced but not declared in a collation element or is not the codepoint
collation URI . (This rule has been added to xsl elements to be consistent
with XQuery and XPath Functions documentation).
3.3.4 xml:lang attribute
xml:lang attribute is ignored (according to recommendations from
http://www.w3.org/TR/xquery-operators 7.3 Equality and Comparison of
Strings).
Igor Hersht
XSLT Development
IBM Canada Ltd., 8200 Warden Avenue, Markham, Ontario L6G 1C7
Office D2-260, Phone (905)413-3240 ; FAX (905)413-4839
----- Forwarded by Igor Hersht/Toronto/IBM on 11/10/2003 05:29 PM -----
Igor
Hersht/Toronto/IB To:
xalan-dev@xml.apache.org
M@IBMCA cc:
xalan-dev@xml.apache.org
Subject: Re: A package for
code common for XSLTC,
11/10/2003 04:50 xalan interpretive and
XQuery.
PM
Please respond to
xalan-dev
I think we have 2 issues here. Packaging and internalization (in particular
collation) specs.
I would go with org.apache.xalan.common.internalization for the common
code.
Internalization (in particular collation) specs have a lot of
implementation
defined freedom. I understood that the specs just cannot be that specific
because of dependences on underline implementations. After a lot of
discussions we came with a specific spec
(See attached file: collation.doc)
which I think we should implement for all (xalan interpretive, XSLTC,
xalan C++) our processors. (Actually I wrote a draft implementation of the
specs for XSLTC). The specs could be changed in future if XSLT 2.0 would
became less ambiguous (e.g. with respect to default collation. ).
I also think that not only collation, but also lang attribute could be more
specific and has common rules for different xsl elements.
Actually our specs are in a draft form and obviously not perfect.
Discussions would be appreciated.
Igor Hersht
XSLT Development
IBM Canada Ltd., 8200 Warden Avenue, Markham, Ontario L6G 1C7
Office D2-260, Phone (905)413-3240 ; FAX (905)413-4839
david_marston@us.
ibm.com To:
xalan-dev@xml.apache.org
cc:
11/10/2003 02:34 Subject: Re: A package for
code common for XSLTC,
PM xalan interpretive and
XQuery.
Please respond to
xalan-dev
Igor Hersht wrote:
>Maybe it could be good idea to have package
>(e.g. org.apache.xalan.common) where we could have such code
[for internationalization]
>as well as any code which is common for
>XSLT2.0, XPath 2.0, XQuery 1.0 and XPath 2.0 Functions and Operators.
Given that all the above specs treat collating as a special module,
I like the idea of isolating collators. We might even want to make it
easy for others to write special-purpose collators. With the proper
naming conventions, they could use their own collators in the F&O
functions that allow reference to a collator by "name" (actually by
URI). We also need to define the URIs for Xalan collators.
.................David Marston
**** Attachment collation.doc has been removed from this note on 10
November 2003 by Igor Hersht ****