<character>

You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by "Peter B. West" <pb...@powerup.com.au> on 2002/09/26 16:40:58 UTC

Fopdevs,

Any comments on the representation and parsing of <character> type 
attributes would be gratefully received.

Peter
-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re:

Posted by "J.Pietschmann" <j3...@yahoo.de>.

Peter B. West wrote:
>> Just for curiosity: what should happen if the following snippet
>> is used:
>>  <fo:page-sequence master-reference="font-size" font-size="20pt">
>>    <fo:flow font-size="from-parent(from-parent('master-reference'))"/>
> 
> 
> This looks OK.

I see potential for an Obfuscated FO Code Contest :-)

J.Pietschmann


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re:

Posted by "Peter B. West" <pb...@powerup.com.au>.

Joerg,

Aside from the ambiguity Arved has highlighted concerning the use of the 
term "code point", I was thinking about the recent decision to hack the 
spec so that 'format="1."' becomes acceptable.  I already find the 
automatic conversion of NCNames to strings distasteful enough.  Users 
develop bad habits from that sort of under-the-hood convenience, and 
that leads to loud complaints about all those fussy quotes when they 
become essential.

If I take <character> to be a string literal of length 1, the automatic 
conversion of NCNames of length 1 would allow, say,

<fo:character character="A"/>

So when users try

<fo:character hyphenation-character="-"/>

they are understandably upset when the parser barfs.

My inclination is to do away with the automatic conversion and 
specifically allow an NCName (with explicit conversion) alongside a 
literal when it is considered safe, as it is not in the case of 
<character> or format.  After all, <string> and <character> don't occur 
all that frequently.

See below.

J.Pietschmann wrote:
> 
> According to 5.11 "Property Datatypes", the value is a single
> unicode character. I believe the representation is a
> unceremonial single unicode character, or an NCName whose
> string representation has the length 1. I'd parse such
> attributes as an expression resulting in a string, and
> bomb if the string is longer than 1.
> This would accept
>  character="'a'"
>  character="1 + 1"
>  character="from-parent('font-size') - 12"
> which may upset purist, or not.
> An alternative would be to use a custom parser, which accepts
> either a single character (NCName of length 1) or any of the
> functions inherited-property-value(NCName),
> from-parent( NCName), from-nearest-specified-value( NCName)
> and from-table-column( NCName)
> (might even make a bit of sense for hyphenation-char and
> for fo:character's character in very, very strange cases)

A custom parser should not be necessary.  The <character> restriction 
could be expressed as a constraint on the <string> result, checked at 
the property level.

> Just for curiosity: what should happen if the following snippet
> is used:
>  <fo:page-sequence master-reference="font-size" font-size="20pt">
>    <fo:flow font-size="from-parent(from-parent('master-reference'))"/>

This looks OK.

Peter
-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re:

Posted by "J.Pietschmann" <j3...@yahoo.de>.

Peter B. West wrote:
> Fopdevs,
> 
> Any comments on the representation and parsing of <character> type 
> attributes would be gratefully received.

According to 5.11 "Property Datatypes", the value is a single
unicode character. I believe the representation is a
unceremonial single unicode character, or an NCName whose
string representation has the length 1. I'd parse such
attributes as an expression resulting in a string, and
bomb if the string is longer than 1.
This would accept
  character="'a'"
  character="1 + 1"
  character="from-parent('font-size') - 12"
which may upset purist, or not.
An alternative would be to use a custom parser, which accepts
either a single character (NCName of length 1) or any of the
functions inherited-property-value(NCName),
from-parent( NCName), from-nearest-specified-value( NCName)
and from-table-column( NCName)
(might even make a bit of sense for hyphenation-char and
for fo:character's character in very, very strange cases)

Just for curiosity: what should happen if the following snippet
is used:
  <fo:page-sequence master-reference="font-size" font-size="20pt">
    <fo:flow font-size="from-parent(from-parent('master-reference'))"/>

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re:

Posted by Tony Graham <To...@Sun.COM>.

Peter B. West wrote at 28 Sep 2002 00:39:34 +1000:
...
 > Tony Graham wrote:
...
 > > Section 5.11, Property Datatypes, trumps the individual property
 > > definitions, since Section 5.11 defines "the syntax for specifying the
 > > datatypes usable in property values".  It says "A single Unicode
 > > character."
 > 
 > Ok, so it's a character.  How, then, is it represented?  Is it also a 
 > <string> (of length one), or is it just a literal (length 1), or just an 
 > NCName (length 1), or is it something else?  What does it look like, and 
 > how is the parser going to handle it?

A character is a character, and you should go to XML 1.0 for the
definition of a character.

Also, "parser" is ambiguous in this context as well as having no XML
or XSL meaning.  XML defines an XML processor, which is often called a
"parser" for historical reasons, and the XSL Recommendation uses
"parse" without designating a thing called a "parser".

 > ...
 > 
 > >  > So IMO the spec is currently very vague on this.
 > > 
 > > Then write to xsl-editors@w3.org asking for a clarification.
 > 
 > Nice dry wit you have Tony.

That was a serious suggestion.  You do get an answer eventually, even
if you don't like the answer.

Regards,


Tony Graham
------------------------------------------------------------------------
XML Technology Center - Dublin                mailto:tony.graham@sun.com
Sun Microsystems Ireland Ltd                       Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3            x(70)19708

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re:

Posted by "J.Pietschmann" <j3...@yahoo.de>.

Peter B. West wrote:
> Ok, so it's a character.  How, then, is it represented?  Is it also a 
> <string> (of length one), or is it just a literal (length 1), or just an 
> NCName (length 1), or is it something else?

Sorry for having sidetracked you with the NCName stuff,
it is too restricted. For example, you can't parse
hyphenation-char="-" as NCName.
A "literal char" is problemtic too, as is the "code point"
formulation: the problem of combining marks has already
been mentioned, and then there are non-baseplane chars
(consisting of two "code points").

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re:

Posted by "Peter B. West" <pb...@powerup.com.au>.

Tony,

Thanks for responding.  See below.

Tony Graham wrote:
> Arved Sandstrom wrote at 26 Sep 2002 19:50:01 -0300:
>  > Tony Graham says that <character> should be a Unicode character, or Char. As
>  > in the actual real, encoded thing.
> 
> Empirical evidence suggests that is the general understanding:
> grepping the XSL CR test suite shows everybody, FOP included, using
> literal characters.
> 
>  > Problem being, one property with a <character> datatype is defined in XSLT,
>  > which actually says that it's a Char. "hyphenation-separator" merely says
>  > that it's a specification of a Unicode character. I guess that could be
>  > interpreted the same way.
>  > 
>  > But <character> for the "character" property says _code point_. And that is
>  > an integer value.
> 
> Section 5.11, Property Datatypes, trumps the individual property
> definitions, since Section 5.11 defines "the syntax for specifying the
> datatypes usable in property values".  It says "A single Unicode
> character."

Ok, so it's a character.  How, then, is it represented?  Is it also a 
<string> (of length one), or is it just a literal (length 1), or just an 
NCName (length 1), or is it something else?  What does it look like, and 
how is the parser going to handle it?

...

>  > So IMO the spec is currently very vague on this.
> 
> Then write to xsl-editors@w3.org asking for a clarification.

Nice dry wit you have Tony.

Peter
-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

RE:

Posted by Tony Graham <To...@Sun.COM>.

Arved Sandstrom wrote at 26 Sep 2002 19:50:01 -0300:
 > Tony Graham says that <character> should be a Unicode character, or Char. As
 > in the actual real, encoded thing.

Empirical evidence suggests that is the general understanding:
grepping the XSL CR test suite shows everybody, FOP included, using
literal characters.

 > Problem being, one property with a <character> datatype is defined in XSLT,
 > which actually says that it's a Char. "hyphenation-separator" merely says
 > that it's a specification of a Unicode character. I guess that could be
 > interpreted the same way.
 > 
 > But <character> for the "character" property says _code point_. And that is
 > an integer value.

Section 5.11, Property Datatypes, trumps the individual property
definitions, since Section 5.11 defines "the syntax for specifying the
datatypes usable in property values".  It says "A single Unicode
character."

Now, the interesting if so far theoretical case is what do you do if
you want a hyphenation-separator character that you can only represent
in Unicode as the combination of a base character and one or more
combining marks?  What if your precomposed character gets normalised
to a base character and a combining mark before the XSL processor sees
it?

 > So IMO the spec is currently very vague on this.

Then write to xsl-editors@w3.org asking for a clarification.

Regards,


Tony Graham
------------------------------------------------------------------------
XML Technology Center - Dublin                mailto:tony.graham@sun.com
Sun Microsystems Ireland Ltd                       Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3            x(70)19708

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re:

Posted by "Peter B. West" <pb...@powerup.com.au>.

Arved,

Thanks for this.  I vaguely remembered some discussion about this, but I 
went looking in the xsl-editors archive.  That _code point_ had me 
puzzled as well.  I'll be interested in some feedback on this from the 
editors.  See also my response to Joerg.

Peter

Arved Sandstrom wrote:
>>From: Peter B. West [mailto:pbwest@powerup.com.au]
>>
>>Fopdevs,
>>
>>Any comments on the representation and parsing of <character> type
>>attributes would be gratefully received.
> 
> 
> This came up on www-xsl-fo, because Eric Bischoff and myself had the same
> question.
> 
> Tony Graham says that <character> should be a Unicode character, or Char. As
> in the actual real, encoded thing.
> 
> Problem being, one property with a <character> datatype is defined in XSLT,
> which actually says that it's a Char. "hyphenation-separator" merely says
> that it's a specification of a Unicode character. I guess that could be
> interpreted the same way.
> 
> But <character> for the "character" property says _code point_. And that is
> an integer value.
> 
> So IMO the spec is currently very vague on this.

-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

RE:

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.

> -----Original Message-----
> From: Peter B. West [mailto:pbwest@powerup.com.au]
> Sent: September 26, 2002 11:41 AM
> To: fop-dev
> Subject: <character>
>
> Fopdevs,
>
> Any comments on the representation and parsing of <character> type
> attributes would be gratefully received.

This came up on www-xsl-fo, because Eric Bischoff and myself had the same
question.

Tony Graham says that <character> should be a Unicode character, or Char. As
in the actual real, encoded thing.

Problem being, one property with a <character> datatype is defined in XSLT,
which actually says that it's a Char. "hyphenation-separator" merely says
that it's a specification of a Unicode character. I guess that could be
interpreted the same way.

But <character> for the "character" property says _code point_. And that is
an integer value.

So IMO the spec is currently very vague on this.

Regards,
Arved


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org