You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Igor Hersht <ig...@ca.ibm.com> on 2003/11/11 18:07:16 UTC
Re:Internalization A package for code common for XSLTC, xalan interpretive and XQuery.






I looks like I had problems sending the note with an attachment

I think we have 2 issues here. Packaging and internalization (in particular
collation) specs.
I would go with org.apache.xalan.common.internalization for the common
code.
Internalization  (in particular collation) specs have a lot of
implementation
defined freedom.  I understood that the specs just cannot be that specific
because  of dependences on underline implementations. After a lot of
discussions  we came with a specific spec which I think we should
implement for all (xalan interpretive, XSLTC, xalan C++) our processors.
(Actually I wrote a draft implementation  of thespecs for XSLTC). The specs
could be changed in future if XSLT 2.0 would
became less ambiguous (e.g. with respect to default collation. ).
I also think that not only collation, but also lang attribute could be more
specific and has common rules for different xsl elements.

Actually our specs are in a draft form and  obviously not perfect.
Discussions would be appreciated.

_______________________________________________________________________


   3.3 xalan, xsltc, xalanC collation specifications

   3.3.1 xsl-collation:collation element
   (xmlns:xsl-collation="http://xml.apache.org/xalan/collation" ).

   <!-- Category: declaration -->
   <xsl-collation:collation
             name = uri-reference
             lang =  nmtoken
             decomposition = "no" | "canonical" | "full"
            strength = "primary" | "secondary" | "tertiary" | "identical"
          rules =  string
          default  = "yes" | "no"
            / >

The xsl-collation:collation element is a top-level element. It is a
collation declaration and used to define collating sequences. The collation
ID is a URI  and it is defined in the mandatory attribute name. Both an
absolute and a relative URI can be used similarly (e.g.  a relative URI
serves just  as a collation identifier and therefore should not be resolved
with respect to any base).  The processor is not required to validate that
the name is a valid URI . (The behavior is undefined if the name is not a
valid URI). An error must be issued if the name value:
equals to an empty string or equals to Unicode codepoint collation URI
http://www.w3.org/2003/05/xpath-functions/collation/codepoint").

The other attributes are optional:
*  lang: follows the rules of the xml:lang attribute. Some clarifications
of the rules will apply.
 Not specifying lang means a default language.  It is an error if  lang is
invalid according to IETF RFC 1766 specifications (see also
http://www.w3.org/XML/xml-19980210-errata).
If  lang is specified, it is used to construct the ISO-639 language code,
the ISO-3166 country  code and a variant code.
Mapping from the lang attribute to the ISO-639 language and  the ISO-3166
country  code described in
http://www.w3.org/TR/1998/REC-xml-19980210#sec-lang-tag.
The ISO-639 language is default if  the code has not been specified .
ISO-639 language is also default  if an implementation cannot find a
resources for constructed ISO-639 language. A warning message should be
issued in this case.
The country is default the code has not been specified. The country is also
default with given ISO-639 language  if implementation cannot find a
resources for constructed  ISO-3166 country code. A warning message should
be issued in this case.
A  variant code is considered to be implementation defined. We construct
the code from the substring after the second tag separator by converting it
to upper case and replacing
all ?-? characters  with ?_?. The variant code is  ignored, if
implementation cannot find a resources for the variant code with given
ISO-639 language and ISO-3166 country code. A warning message should be
issued in this case. Changes in behavior caused by the variant are
implementation defined.

*  decomposition: Determines how the collator handles Unicode composed
characters.
 (See the JDK 1.2 documentation for details). Not specifying a
 decomposition means a decomposition default for specified  lang attribute.

*  strength: sets the strength of the collator. (See the JDK 1.2
documentation for details).
   Not specifying strength means a strength default for specified lang
attribute.

*  rules: Sets the rules to be used by a RuleBasedCollator.  The rules
should be used  as
  a ?modifier? for the given rules. (See the JDK 1.2  documentation for
details.  e.g.
  if  "a < b < c < d"  according to the original rules, and rules ="b < a",
then the modified rules
  are "b < a < c < d" ). It is an error if  rules value equals  to an empty
string.

* default: The value "yes" indicates that this collation is to be used as
the default collation.
   Not specifying default means ?no?.

A collation element attribute should be ignored if an implementation cannot
process it and it is not specifed above as an error.  A warning message
should be issued in this case. No attribute, other than specified above, is
legal. A error message should be issued in this case.

3.3.2  Usage in XQuery and XPath  Functions and in xsl elements

Collation may be used in XQuery and XPath  Functions (see
http://www.w3.org/TR/xquery-operators) fn:compare, fn:starts-with,
fn:ends-with, fn:contains, fn:substring-before, fn:substring-after,
fn:index-of, fn:distinct-values, fn:deep-equal, fn:max, fn:min,
fn:default-collation.
Usage is specified in http://www.w3.org/TR/xquery-operators/#charmod

Collation also can be used in xsl elements: xsl:for-each-group, xsl:key and
xsl:sort.

xsl:sort element has case-order and lang attributes  and because of it has
some additional rules which are specified in http://www.w3.org/TR/xslt20
(13.2 The xsl:sort Element).


3.3.2.1  Collation name resolution

First, if a stylesheet contains declarations of two or more collation
elements have the same name, the one with the highest import precedence is
used and  the elements with lower import precedence are eliminated from
consideration . It is an error for a stylesheet to contain two or more
collation elements with the same name (taking into account that the
elements with lower import precedence have been eliminated from
consideration).
If no default collation  was specified the Unicode codepoint collation is
used as a default collation.  If two or more collation elements were
specified as default collation, the one with the highest import precedence
is used.  It is an error for a stylesheet to have  two or more collation
elements specified as default, if they have the same import precedence,
unless there is another collation element which was specified as default
and has a higher import precedence.

It is an "Unsupported collation" error if a collation name was explicitly
referenced but not declared in a collation element or is not the codepoint
collation URI . (This rule has been added to xsl elements to be consistent
with XQuery and XPath  Functions documentation).

3.3.4 xml:lang attribute

xml:lang attribute is ignored (according to recommendations from
http://www.w3.org/TR/xquery-operators  7.3 Equality and Comparison of
Strings).




Igor Hersht
XSLT Development
IBM Canada Ltd., 8200 Warden Avenue, Markham, Ontario L6G 1C7
Office D2-260, Phone (905)413-3240 ; FAX  (905)413-4839
----- Forwarded by Igor Hersht/Toronto/IBM on 11/10/2003 05:29 PM -----


                      Igor

                      Hersht/Toronto/IB        To:
xalan-dev@xml.apache.org
                      M@IBMCA                  cc:
xalan-dev@xml.apache.org
                                               Subject:  Re: A package for
code common for XSLTC,
                      11/10/2003 04:50          xalan interpretive and
XQuery.
                      PM

                      Please respond to

                      xalan-dev











I think we have 2 issues here. Packaging and internalization (in particular
collation) specs.
I would go with org.apache.xalan.common.internalization for the common
code.
Internalization  (in particular collation) specs have a lot of
implementation
defined freedom.  I understood that the specs just cannot be that specific
because  of dependences on underline implementations. After a lot of
discussions  we came with a specific spec
(See attached file: collation.doc)

which I think we should  implement for all (xalan interpretive, XSLTC,
xalan C++) our processors. (Actually I wrote a draft implementation  of the
specs for XSLTC). The specs  could be changed in future if XSLT 2.0 would
became less ambiguous (e.g. with respect to default collation. ).
I also think that not only collation, but also lang attribute could be more
specific and has common rules for different xsl elements.

Actually our specs are in a draft form and  obviously not perfect.
Discussions would be appreciated.


Igor Hersht
XSLT Development
IBM Canada Ltd., 8200 Warden Avenue, Markham, Ontario L6G 1C7
Office D2-260, Phone (905)413-3240 ; FAX  (905)413-4839



                      david_marston@us.
                      ibm.com                  To:
xalan-dev@xml.apache.org
                                               cc:
                      11/10/2003 02:34         Subject:  Re: A package for
code common for XSLTC,
                      PM                        xalan interpretive and
XQuery.
                      Please respond to
                      xalan-dev





Igor Hersht wrote:
>Maybe it could be good idea to have package
>(e.g. org.apache.xalan.common) where we could have such code
[for internationalization]
>as well as any code which is common for
>XSLT2.0, XPath 2.0, XQuery 1.0 and XPath 2.0 Functions and Operators.

Given that all the above specs treat collating as a special module,
I like the idea of isolating collators. We might even want to make it
easy for others to write special-purpose collators. With the proper
naming conventions, they could use their own collators in the F&O
functions that allow reference to a collator by "name" (actually by
URI). We also need to define the URIs for Xalan collators.
.................David Marston



**** Attachment collation.doc has been removed from this note on 10
November 2003 by Igor Hersht ****