You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by DA Shetland <ds...@twcny.rr.com> on 2007/01/15 15:47:53 UTC

SHY characters in user mode

To fop folks interested in hyphens - or not.

I have been following with interest the development of ideas in fop-dev 
for hyphenation implementation, and have started several times to post 
this to that list, but my immediate issue is at the user level.

In a data set for which I have recently been working on an XML to PDF 
process using XSL-FOP, I just recently was reminded of the SHY character 
when I noticed the word "rec-ords" right in the middle of a sentence in 
the PDF output.  It turns out to be a SHY.

I am not using hyphenation at this time.

 From my 20+ years of working with documentation systems, it seems to me 
the behavior to be expected here is simple (although I realize the 
implementation issues can be very troublesome).  The SHY character 
should disappear from the containing string - always.  In fact SHY is 
the character that is not a character - it is a one character size 
processing instruction that happens to enjoy a code point in character 
tables, but strictly speaking, it doesn't even need a glyph (we have a 
code point for the hyphen).  In every case, hyphenation available on or 
not, turned on or not, the SHY needs to disappear from the string - of 
course, if hyphenation is available, the location needs to be remembered 
for later use.

So my question is--
Is there some way to explicitly suppress the SHY?
Is there something wrong with my installation?

Thanks for any thoughts.

Dave Shetland
Programmer/Analyst
Legal Information Institute
Cornell Law School


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by DA Shetland <ds...@twcny.rr.com>.
Oh, yes, we know about ad-hoc filters :-)
Guess that is what I'll do for now, then, instead of messing with the 
trunk version.
Just a suggestion - hyphenation is such a huge subject - maybe there 
could be an early version that just does the de-SHY function really 
well. No response needed - just a thought.
Thanks so much for a lot of amazing work, and good luck with the hyphens!
-d-

J.Pietschmann wrote:
> DA Shetland wrote:
>> Is there some way to explicitly suppress the SHY?
>
> FOP currently doesn't do this for you, but you can always try
> to preprocess the data before it ets into FOP. Deleting soft
> hyphens should be reasonably easy.
>
> The SHY should vanish automatically once the implementation
> in FOP works correctly :-)
>
>> Is there something wrong with my installation?
>
> I don't understand this question.
>
> J.Pietschmann
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by "J.Pietschmann" <j3...@yahoo.de>.
DA Shetland wrote:
> Is there some way to explicitly suppress the SHY?

FOP currently doesn't do this for you, but you can always try
to preprocess the data before it ets into FOP. Deleting soft
hyphens should be reasonably easy.

The SHY should vanish automatically once the implementation
in FOP works correctly :-)

> Is there something wrong with my installation?

I don't understand this question.

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by DA Shetland <ds...@twcny.rr.com>.
Well, I *should* be up-shifted to XSLT-2 by now (my "cookbook" certainly 
has) - this might be just the final push I need  :-)
Your solution really puts the patch exactly where it belongs, suitable 
for tracking ongoing fop-work.
Thanks again.
-Dave Shetland

Abel Braaksma wrote:
> DA Shetland wrote:
>
>> My saying that strings have to be de-SHYed up front was as much as 
>> saying that I don't trust the presentation layer to do what it needs 
>> to do for all categories of control characters, combining characters, 
>> etc. of which SHY is one.
>>
>> So I will pre-filter and watch for news. 
>
>
> In the case you are using XSLT 2 (and in the event you haven't applied 
> a filter yet), here is an easy solution you can use for all your 
> templates:
>
> <xsl:output use-character-maps="remove-SHY" />
>
> <xsl:character-map name="remove-SHY">
>    <xsl:output-character character="&#xAD;" string=""/>
> </xsl:character-map>
>
>
>
> If you place it in an include/import xslt file, you can easily expand 
> the list with other control characters. Note that this method is much 
> faster (should be) than regex replacing.
>
> Cheers
> -- Abel Braaksma
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by Abel Braaksma <ab...@xs4all.nl>.
J.Pietschmann wrote:
> Abel Braaksma wrote:
>> In the case you are using XSLT 2 (and in the event you haven't 
>> applied a filter yet), here is an easy solution you can use for all 
>> your templates:
>>
>> <xsl:output use-character-maps="remove-SHY" />
>
> I was under the impression that output maps are only applied if
> the transformation result is serialized, which is not the case
> if the FO process is chained to the transformation using SAX
> events.
> Can anybody confirm that output maps work even in the latter case?

I never thought of that, but you are right, of course. Though I am not 
certain how processors actually deal with this when you do an in-memory 
transformation, processors are allowed to ignore xsl:output and 
xsl:character-map. Here's the quote from the XSLT 2 REC that says so:

"When serialization is not being performed, either because the 
implementation does not support the serialization option, or because the 
user is executing the transformation in a way that does not invoke 
serialization, then the content of the xsl:output and xsl:character-map 
declarations has no effect. Under these circumstances the processor may 
report any errors in an xsl:output or xsl:character-map declaration, or 
in the serialization attributes of xsl:result-document, but is not 
required to do so."

However, the workaround is simple (but a bit more time-consuming for the 
processor to perform, and perhaps to implement):

<xsl:value-of select="replace( $your-shy-value, '&#xAD', '' )" />

You'll have to do this for every text node containing a SHY. If you find 
that too cumbersome (I would), another resolution is be to wrap your 
result in a temporary result tree, and apply an identity template to it:

<xsl:template match="/" >
   <xsl:variable name="temp-result">
        <!-- apply your normal templates as they are now -->
        <xsl:apply-templates />
   </xsl:variable>
   <xsl:apply-template select="$temp-result/*" mode="remove-SHY" />
</xsl:template>

<xsl:template match="text()" mode="remove-SHY">
    <xsl:copy-of select="replace( . , '&#xAD', '' )" />
</xsl:template>

<!-- copy template -->
<xsl:template match="node() | @*" mode="remove-SHY">
    <xsl:copy>
        <xsl:apply-templates select="node() | @*" mode="#current" />
   </xsl:copy>
</xsl:template>

that way you only have to change your existing code at one spot (but 
note that the serialization option is less processor-intensive, so if 
you do some serialization before applying FOP, go for that option).

Cheers,
-- Abel Braaksma
   http://www.nuntia.nl

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Abel Braaksma wrote:
> In the case you are using XSLT 2 (and in the event you haven't applied a 
> filter yet), here is an easy solution you can use for all your templates:
> 
> <xsl:output use-character-maps="remove-SHY" />

I was under the impression that output maps are only applied if
the transformation result is serialized, which is not the case
if the FO process is chained to the transformation using SAX
events.
Can anybody confirm that output maps work even in the latter case?

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by Abel Braaksma <ab...@xs4all.nl>.
DA Shetland wrote:

> My saying that strings have to be de-SHYed up front was as much as 
> saying that I don't trust the presentation layer to do what it needs 
> to do for all categories of control characters, combining characters, 
> etc. of which SHY is one.
>
> So I will pre-filter and watch for news. 


In the case you are using XSLT 2 (and in the event you haven't applied a 
filter yet), here is an easy solution you can use for all your templates:

<xsl:output use-character-maps="remove-SHY" />

<xsl:character-map name="remove-SHY">
    <xsl:output-character character="&#xAD;" string=""/>
</xsl:character-map>



If you place it in an include/import xslt file, you can easily expand 
the list with other control characters. Note that this method is much 
faster (should be) than regex replacing.

Cheers
-- Abel Braaksma

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by DA Shetland <ds...@twcny.rr.com>.

Abel Braaksma wrote:
> DA Shetland wrote:
>> In fact SHY is the character that is not a character - it is a one 
>> character size processing instruction that happens to enjoy a code 
>> point in character tables, but strictly speaking, it doesn't even 
>> need a glyph (we have a code point for the hyphen).  
>
> There are very many "characters" of this kind. Like the RTL and LTR 
> markers and combining characters.
Yes. But all the ones I can think of pretty much look like the control 
characters or sequences they are.  With the SHY, you have to really look 
carefully to realize that it is one of "those."  Then look again (and 
again :-) to figure out the deeper meaning(s).
>
>> In every case, hyphenation available on or not, turned on or not, the 
>> SHY needs to disappear from the string - of course, if hyphenation is 
>> available, the location needs to be remembered for later use.
>
> It happens that the Unicode consortium defines this "glyph" as a glyph 
> that should indeed not appear (this is a change between Unicode 3 to 
> Unicode 4). But the "of course" in your sentence is not so obvious as 
> it may seem. This article shows an in-depth coverage of the SOFT 
> HYPHEN: 
> http://www.cl.cam.ac.uk/~mgk25/ucs/L2/03155r-kuhn-soft-hyphen.pdf.
Thank you!  Excellent summary.
>
> In short: the soft hyphen should not be removed from the data stream. 
> However, it should not appear in the sentence, unless the sentence 
> must be hyphenated due to a line break.
>
> To support this, the SOFT HYPHEN is a Cf (Other, format) group 
> character that normally has no visible appearance, unless in special 
> situations. It also does not count as a character when the numbers of 
> characters is counted, nor does it appear in comparison functions (it 
> is ignored).
Again, thank you for the very helpful extension and corrections of my 
thoughts, which were, as usual, drifting into a far too implementation 
orientation.  My saying that strings have to be de-SHYed up front was as 
much as saying that I don't trust the presentation layer to do what it 
needs to do for all categories of control characters, combining 
characters, etc. of which SHY is one.

So I will pre-filter and watch for news.
>
> -- Abel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by Abel Braaksma <ab...@xs4all.nl>.
DA Shetland wrote:
> In fact SHY is the character that is not a character - it is a one 
> character size processing instruction that happens to enjoy a code 
> point in character tables, but strictly speaking, it doesn't even need 
> a glyph (we have a code point for the hyphen).  

There are very many "characters" of this kind. Like the RTL and LTR 
markers and combining characters.

> In every case, hyphenation available on or not, turned on or not, the 
> SHY needs to disappear from the string - of course, if hyphenation is 
> available, the location needs to be remembered for later use.

It happens that the Unicode consortium defines this "glyph" as a glyph 
that should indeed not appear (this is a change between Unicode 3 to 
Unicode 4). But the "of course" in your sentence is not so obvious as it 
may seem. This article shows an in-depth coverage of the SOFT HYPHEN: 
http://www.cl.cam.ac.uk/~mgk25/ucs/L2/03155r-kuhn-soft-hyphen.pdf.

In short: the soft hyphen should not be removed from the data stream. 
However, it should not appear in the sentence, unless the sentence must 
be hyphenated due to a line break.

To support this, the SOFT HYPHEN is a Cf (Other, format) group character 
that normally has no visible appearance, unless in special situations. 
It also does not count as a character when the numbers of characters is 
counted, nor does it appear in comparison functions (it is ignored).

-- Abel


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by DA Shetland <ds...@twcny.rr.com>.
Manuel Mall wrote:
> On Tuesday 16 January 2007 00:02, DA Shetland wrote:
>   
>> Sorry - I am using 0.93 - will try the trunk and get back.
>> -d-
>>
>>     
> SHY support in fop-trunk is a very new addition and any testing and 
> feedback would be much appreciated. It should work as you described, 
> that is SHY being suppressed everywhere and only used to indicate a 
> hyphenated line break possibility. If chosen as a line break a proper 
> hyphen character is put in its place.
Manuel:
Once I got to it, the whole build from trunk thing really was a 
"no-brainer" - sorry for the noise about it.
The great news is that my first SHY behavior test indicates *success* - 
Thank You!
A superficial look-around does not see anything else having gone wrong, 
but I'll just keep using the trunk for now (and we are still in XSL 
development, so are staring pretty closely at the PDF output) and let 
you know of any problems - or radical successes, like this one.
And thanks again to all inside the fop project.
-d-


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by Manuel Mall <ma...@apache.org>.
On Tuesday 16 January 2007 00:02, DA Shetland wrote:
> Sorry - I am using 0.93 - will try the trunk and get back.
> -d-
>
SHY support in fop-trunk is a very new addition and any testing and 
feedback would be much appreciated. It should work as you described, 
that is SHY being suppressed everywhere and only used to indicate a 
hyphenated line break possibility. If chosen as a line break a proper 
hyphen character is put in its place.

Manuel
> Jeremias Maerki wrote:
> > You should start by saying which FOP version you are using. For the
> > recent SHY stuff to work you need the latest code from the
> > Subversion repository (FOP Trunk). It is not available in 0.93.
> >
> > On 15.01.2007 15:47:53 DA Shetland wrote:
> >> To fop folks interested in hyphens - or not.
> >>
> >> I have been following with interest the development of ideas in
> >> fop-dev for hyphenation implementation, and have started several
> >> times to post this to that list, but my immediate issue is at the
> >> user level.
> >>
> >> In a data set for which I have recently been working on an XML to
> >> PDF process using XSL-FOP, I just recently was reminded of the SHY
> >> character when I noticed the word "rec-ords" right in the middle
> >> of a sentence in the PDF output.  It turns out to be a SHY.
> >>
> >> I am not using hyphenation at this time.
> >>
> >>  From my 20+ years of working with documentation systems, it seems
> >> to me the behavior to be expected here is simple (although I
> >> realize the implementation issues can be very troublesome).  The
> >> SHY character should disappear from the containing string -
> >> always.  In fact SHY is the character that is not a character - it
> >> is a one character size processing instruction that happens to
> >> enjoy a code point in character tables, but strictly speaking, it
> >> doesn't even need a glyph (we have a code point for the hyphen). 
> >> In every case, hyphenation available on or not, turned on or not,
> >> the SHY needs to disappear from the string - of course, if
> >> hyphenation is available, the location needs to be remembered for
> >> later use.
> >>
> >> So my question is--
> >> Is there some way to explicitly suppress the SHY?
> >> Is there something wrong with my installation?
> >>
> >> Thanks for any thoughts.
> >>
> >> Dave Shetland
> >> Programmer/Analyst
> >> Legal Information Institute
> >> Cornell Law School
> >
> > Jeremias Maerki
> >
> >
> > -------------------------------------------------------------------
> >-- To unsubscribe, e-mail:
> > fop-users-unsubscribe@xmlgraphics.apache.org For additional
> > commands, e-mail: fop-users-help@xmlgraphics.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail:
> fop-users-help@xmlgraphics.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by DA Shetland <ds...@twcny.rr.com>.
Sorry - I am using 0.93 - will try the trunk and get back.
-d-

Jeremias Maerki wrote:
> You should start by saying which FOP version you are using. For the
> recent SHY stuff to work you need the latest code from the Subversion
> repository (FOP Trunk). It is not available in 0.93.
>
> On 15.01.2007 15:47:53 DA Shetland wrote:
>   
>> To fop folks interested in hyphens - or not.
>>
>> I have been following with interest the development of ideas in fop-dev 
>> for hyphenation implementation, and have started several times to post 
>> this to that list, but my immediate issue is at the user level.
>>
>> In a data set for which I have recently been working on an XML to PDF 
>> process using XSL-FOP, I just recently was reminded of the SHY character 
>> when I noticed the word "rec-ords" right in the middle of a sentence in 
>> the PDF output.  It turns out to be a SHY.
>>
>> I am not using hyphenation at this time.
>>
>>  From my 20+ years of working with documentation systems, it seems to me 
>> the behavior to be expected here is simple (although I realize the 
>> implementation issues can be very troublesome).  The SHY character 
>> should disappear from the containing string - always.  In fact SHY is 
>> the character that is not a character - it is a one character size 
>> processing instruction that happens to enjoy a code point in character 
>> tables, but strictly speaking, it doesn't even need a glyph (we have a 
>> code point for the hyphen).  In every case, hyphenation available on or 
>> not, turned on or not, the SHY needs to disappear from the string - of 
>> course, if hyphenation is available, the location needs to be remembered 
>> for later use.
>>
>> So my question is--
>> Is there some way to explicitly suppress the SHY?
>> Is there something wrong with my installation?
>>
>> Thanks for any thoughts.
>>
>> Dave Shetland
>> Programmer/Analyst
>> Legal Information Institute
>> Cornell Law School
>>     
>
>
> Jeremias Maerki
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: SHY characters in user mode

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
You should start by saying which FOP version you are using. For the
recent SHY stuff to work you need the latest code from the Subversion
repository (FOP Trunk). It is not available in 0.93.

On 15.01.2007 15:47:53 DA Shetland wrote:
> To fop folks interested in hyphens - or not.
> 
> I have been following with interest the development of ideas in fop-dev 
> for hyphenation implementation, and have started several times to post 
> this to that list, but my immediate issue is at the user level.
> 
> In a data set for which I have recently been working on an XML to PDF 
> process using XSL-FOP, I just recently was reminded of the SHY character 
> when I noticed the word "rec-ords" right in the middle of a sentence in 
> the PDF output.  It turns out to be a SHY.
> 
> I am not using hyphenation at this time.
> 
>  From my 20+ years of working with documentation systems, it seems to me 
> the behavior to be expected here is simple (although I realize the 
> implementation issues can be very troublesome).  The SHY character 
> should disappear from the containing string - always.  In fact SHY is 
> the character that is not a character - it is a one character size 
> processing instruction that happens to enjoy a code point in character 
> tables, but strictly speaking, it doesn't even need a glyph (we have a 
> code point for the hyphen).  In every case, hyphenation available on or 
> not, turned on or not, the SHY needs to disappear from the string - of 
> course, if hyphenation is available, the location needs to be remembered 
> for later use.
> 
> So my question is--
> Is there some way to explicitly suppress the SHY?
> Is there something wrong with my installation?
> 
> Thanks for any thoughts.
> 
> Dave Shetland
> Programmer/Analyst
> Legal Information Institute
> Cornell Law School


Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org