You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by David Kelly <dk...@scriptorium.com> on 2007/07/20 20:55:31 UTC

Lines not breaking with zero width space characters inserted

Greetings,

I have looked at the threads relating to line break issues as well as looking at various documentation on how to force
line breaks.  From what I have read, it appears that inserting a zero-width space character in a long string should
enable the string to be broken at the ZWS character when it spans a column edge (in my case, a very narrow table
column).  I have succeeded in inserting various kinds of "breakable" characters after underscores (which are common in
the strings I'm trying to break).  However, the only character that actually causes a break is a full space character,
&#x0020; .  Other characters I've tried include the ZWS U+200B, soft hyphen U+00AD, and hair space U+200A -- but these
do not appear to enable line breaking for me.

I am using FOP 0.93 in a Windows environment. I'm also using an en-US hyphenation table from OFFO.

What I would like to find out is:

1. Should the ZWS work, and if so, what might cause it not to work?

2. Are there alternatives for creating a line-break after an underscore?  (I tried customizing a hyphenation pattern in
the en_US.xml hyphenation file and pointing the cfg.xml <hyphenation_base> to point to its directory, but this did not
appear to work for me either - output shows the hyphenation file is being read, but the new pattern apparently is not
being processed.  I tried changing the hyphen character in the en-US.xml file, and that did not get picked up either.

Sorry to put more than one issue in a message - I've been working on this problem for way too long with little result.

Any pointers would be appreciated.

Thanks,
David Kelly


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Lines not breaking with zero width space characters inserted

Posted by Paul Vinkenoog <pa...@vinkenoog.nl>.
Hi David,

I re-inserted the ZWSPs in your XSL-FO table (they didn't survive the message transport, but it's obvious where they had been), included it in a small test file, built the PDF and the line wrapped nicely after one of the underscores. When I removed the ZWSPs again, I got an overflow. It didn't make any difference whether hyphenation was on or off.

This was done with a home-built FOP, based on the trunk as it was around May 15th. Repeating the exercise with FOP 0.93, I got an overflow regardless if there were ZWSPs present or not, but only if hyphenation was on. With hyphenation off (and ZWSPs present, of course) everything worked fine with 0.93.

So if you want hyphenation on and functional ZWSP-breaking at the same time, the solution is to upgrade to the current fop-trunk.


Hope this helps!
Paul Vinkenoog

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Lines not breaking with zero width space characters inserted

Posted by DavidJKelly <dk...@scriptorium.com>.
Paul, 

Below I am including the problem source file and the XSL-FO templates that
add the ZWSP.  In response to your suggestions/questions:

> So if yours are inserted via <fo:character>s, change your setup.

I was trying to surround the ZWSP with <fo:inline> but it did not work
either way.

> Second, are you sure the ZWSPs end up correctly in the XSL-FO file?
> Have you actually "seen" them sitting there, with an editor that can
> detect them?

Yes, using oXygen I can read the character codes.  ZWSPs are definitely in
the FO file, and using the Select tool in Acrobat I can step through the
characters and see that an invisible zero-width character of some sort
follows the underscore in the PDF.

> Third, there were some ZWSP-related problems if you embedded fonts in
> the PDF (as opposed to using a standard font). These exist in FOP
> 0.93, but have been fixed in the trunk. So that's another thing you
> might try. 

I checked our FOP config file. We are not embedding fonts that I am aware
of.

Here is the source. Row 1, entry#2 causes an overflow condition with this
string: clock_gettime(clockid_b clock_id
______________
<dita>
     <topic id="topic_D4D77AF22FED4DC8B56A18008FD25CDn">
          <title>Usage Guidelines</title>
          <body>
               <table id="table_43E01DE5DE9B42F3859E7F6998012E4n">
                    <tgroup cols="5">
                         <colspec colname="col1" colnum="1" colwidth="*"/>
                         <colspec colname="col2" colnum="2" colwidth="*"/>
                         <colspec colname="col3" colnum="3" colwidth="*"/>
                         <colspec colname="col4" colnum="4" colwidth="*"/>
                         <colspec colname="col5" colnum="5" colwidth="*"/>
                         <tbody>
                              <row>
                                   <entry valign="top">
                                        <p>Clock_gettime</p>
                                   </entry>
                                   <entry valign="top">
                                        <p>int clock_gettime(clockid_b
clock_id, struct timespec *tp);</p>
                                   </entry>
                                   <entry valign="top">
                                        <p>None</p>
                                   </entry>
                                   <entry valign="top">
                                        <p>SE</p>
                                   </entry>
                                   <entry valign="top">
                                        <p>Retrieves seconds since the
epoch.</p>
                                   </entry>
                              </row>
                              <row>
                                   <entry valign="top">
                                        <p>alarm</p>
                                   </entry>
                                   <entry valign="top">
                                        <p>unsigned alarm(unsigned
seconds);</p>
                                   </entry>
                                   <entry valign="top">
                                        <p>None</p>
                                   </entry>
                                   <entry valign="top">
                                        <p>SE</p>
                                   </entry>
                                   <entry valign="top">
                                        <p>Schedules an alarm signal in
seconds.</p>
                                   </entry>
                              </row>
                         </tbody>
                    </tgroup>
               </table>
          </body>
     </topic>
</dita>
_____________
Following is the XSL I use to embed the ZWSP:
_____________
  <xsl:template match="*[contains(@class,' topic/p ')]">
    <xsl:choose>
         <xsl:when test="parent::entry">
              <fo:block xsl:use-attribute-sets="entry">
                   <xsl:call-template name="replace">
                        <xsl:with-param name="string" select="."/>
                   </xsl:call-template> 
              </fo:block>
         </xsl:when>
      </xsl:choose>
</xsl:template>
     <xsl:template name="replace">
          <xsl:param name="string"/>
          <xsl:variable name="old" select="'_'"/>
          <xsl:variable name="new" >_&#x0026;#x200b;</xsl:variable>
          <xsl:choose>
               <xsl:when test="contains( $string, $old )">
                    <xsl:value-of select="concat(substring-before( $string,
$old  ),$new)"/>
                    <xsl:call-template name="replace">
                         <xsl:with-param name="string"
select="substring-after( $string, $old )"/>
                    </xsl:call-template>
               </xsl:when>
               <xsl:otherwise>
                    <xsl:value-of select="$string"/>
               </xsl:otherwise>
          </xsl:choose>
</xsl:template>
<xsl:attribute-set name="entry">
          <xsl:attribute name="font-size">8pt</xsl:attribute>
          <xsl:attribute name="hyphenate">true</xsl:attribute>
          <xsl:attribute name="language">en</xsl:attribute>
          <xsl:attribute name="country">US</xsl:attribute>
          <xsl:attribute name="wrap-option">wrap</xsl:attribute>
</xsl:attribute-set>
____________
And here is the resultant XSL:FO (only for the problem row):
____________
                              <fo:table table-layout="fixed" width="100%"
space-before="12pt" space-after="10pt" background-color="white"
border-style="solid" border-width="0.5pt" border-color="black">
                                 <fo:table-column
column-width="proportional-column-width(1)"/>
                                 <fo:table-column
column-width="proportional-column-width(1)"/>
                                 <fo:table-column
column-width="proportional-column-width(1)"/>
                                 <fo:table-column
column-width="proportional-column-width(1)"/>
                                 <fo:table-column
column-width="proportional-column-width(1)"/>
                                 <fo:table-body>
                                    <fo:table-row>
                                       <fo:table-cell border-style="solid"
border-width="0.5pt" border-color="black" start-indent="2pt"
background-color="#faf4fa" padding="2pt" display-align="before">
                                          <fo:block hyphenate="true"
language="en" country="US">
                                             <fo:block font-size="8pt"
hyphenate="true" language="en" country="US" wrap-option="wrap">Clock_
gettime</fo:block>
                                          </fo:block>
                                       </fo:table-cell>
                                       <fo:table-cell border-style="solid"
border-width="0.5pt" border-color="black" start-indent="2pt"
background-color="#faf4fa" padding="2pt" display-align="before">
                                          <fo:block hyphenate="true"
language="en" country="US">
                                             <fo:block font-size="8pt"
hyphenate="true" language="en" country="US" wrap-option="wrap">int clock_
gettime(clockid_​t clock_​id, struct timespec *tp);</fo:block>
                                          </fo:block>
                                       </fo:table-cell>
                                       <fo:table-cell border-style="solid"
border-width="0.5pt" border-color="black" start-indent="2pt"
background-color="#faf4fa" padding="2pt" display-align="before">
                                          <fo:block hyphenate="true"
language="en" country="US">
                                             <fo:block font-size="8pt"
hyphenate="true" language="en" country="US"
wrap-option="wrap">None</fo:block>
                                          </fo:block>
                                       </fo:table-cell>
                                       <fo:table-cell border-style="solid"
border-width="0.5pt" border-color="black" start-indent="2pt"
background-color="#faf4fa" padding="2pt" display-align="before">
                                          <fo:block hyphenate="true"
language="en" country="US">
                                             <fo:block font-size="8pt"
hyphenate="true" language="en" country="US" wrap-option="wrap">SE</fo:block>
                                          </fo:block>
                                       </fo:table-cell>
                                       <fo:table-cell border-style="solid"
border-width="0.5pt" border-color="black" start-indent="2pt"
background-color="#faf4fa" padding="2pt" display-align="before">
                                          <fo:block hyphenate="true"
language="en" country="US">
                                             <fo:block font-size="8pt"
hyphenate="true" language="en" country="US" wrap-option="wrap">Retrieves
seconds since the epoch.</fo:block>
                                          </fo:block>
                                       </fo:table-cell>
                                    </fo:table-row>
_____________
Any suggestions would be appreciated!
Regards, 
David
-- 
View this message in context: http://www.nabble.com/Lines-not-breaking-with-zero-width-space-characters-inserted-tf4120139.html#a11743190
Sent from the FOP - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Lines not breaking with zero width space characters inserted

Posted by Paul Vinkenoog <pa...@vinkenoog.nl>.
Hi David,

> I have looked at the threads relating to line break issues as well
> as looking at various documentation on how to force line breaks.
> From what I have read, it appears that inserting a zero-width space
> character in a long string should enable the string to be broken at
> the ZWS character when it spans a column edge (in my case, a very
> narrow table column).

> I have succeeded in inserting various kinds of "breakable"
> characters after underscores (which are common in the strings I'm
> trying to break).
> However, the only character that actually causes a break is a full
> space character, &#x0020; .

> Other characters I've tried include the ZWS U+200B, soft hyphen
> U+00AD, and hair space U+200A -- but these do not appear to enable
> line breaking for me.  I am using FOP 0.93 in a Windows
> environment. I'm also using an en-US hyphenation table from OFFO.

> What I would like to find out is: 1. Should the ZWS work, and if so,
> what might cause it not to work?

ZWS U+200B should definitely work - I use it all the time. However,
I've done some work on this a couple of months ago and I discovered
that they don't work as potential line breakers if you wrap them in a
<fo:character> instead of inserting them directly into the string.

So if yours are inserted via <fo:character>s, change your setup.

Second, are you sure the ZWSPs end up correctly in the XSL-FO file?
Have you actually "seen" them sitting there, with an editor that can
detect them?

Third, there were some ZWSP-related problems if you embedded fonts in
the PDF (as opposed to using a standard font). These exist in FOP
0.93, but have been fixed in the trunk. So that's another thing you
might try.


> 2. Are there alternatives for creating a line-break after an
> underscore?

Beyond the things you mention, I wouldn't know (not that that says a
lot). But it _must_ be possible to get the ZWSPs to work, so let's try
that route first. If the above suggestions don't help, you might
create a small XSL-FO file that doesn't work for you and let me and/or
others try it.


Cheers,
Paul Vinkenoog

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Lines not breaking with zero width space characters inserted

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jul 23, 2007, at 15:16, DavidJKelly wrote:

> W/r/t your answer on hyphenation, I have made these two additions  
> at the top
> of the <patterns> section
>
> 7(
> _7
>
> A typical string that is resisting hyphenation or linebreaking is:
>
> clock_​gettime(clockid_​b clock_​id

I've looked a bit closer at the code, and this would indeed not get  
hyphenated currently :(

Either the bracket or the underscore is considered a non-letter  
character. If it would be the first character(s), the hyphenator  
ignores them. From the moment they follow a letter character, the  
hyphenator quits, leading in the above example to no hyphenation- 
point because the word "clock" cannot be hyphenated, and "_gettime 
(clockid_b" and "_id" will never be considered.

Using ZWSP should work to use Unicode-compliant linebreaking to your  
advantage here, but in that case I would *strongly* advise to switch  
off hyphenation... I have no idea whether it passes through the  
hyphenator, but I can't say for sure that that is not precisely the  
problem in your case, as you're using both. Could cause weird  
effects. I've seen stranger things... Try hyphenating a block with  
preserved linefeeds, for instance.


Cheers

Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Lines not breaking with zero width space characters inserted

Posted by DavidJKelly <dk...@scriptorium.com>.
Andreas,

W/r/t your answer on hyphenation, I have made these two additions at the top
of the <patterns> section 

7(     
_7  

A typical string that is resisting hyphenation or linebreaking is:

clock_​gettime(clockid_​b clock_​id




Andreas L Delmelle wrote:
> 
> On Jul 20, 2007, at 20:55, David Kelly wrote:
> 
>> 2. Are there alternatives for creating a line-break after an  
>> underscore?  (I tried customizing a hyphenation pattern in
>> the en_US.xml hyphenation file and pointing the cfg.xml  
>> <hyphenation_base> to point to its directory, but this did not
>> appear to work for me either - output shows the hyphenation file is  
>> being read, but the new pattern apparently is not
>> being processed.  I tried changing the hyphen character in the en- 
>> US.xml file, and that did not get picked up either.
> 
> WRT the patterns: how exactly do you alter them? Do note that in  
> principle you can only mark the favorability of a hyphenation point,  
> and that is not an absolute...
> 
> Also, I recently noticed that if you would supply the following  
> sequences to FOP's hyphenation
> 
> AAA,BBB,CCC,DDD
> AAA1BBB2CCC3DDD
> 
> You don't get /any/ hyphenation points. The hyphenator gives up once  
> it encounters the first comma or digit. Maybe a similar thing is  
> happening with underscores... I'd have to check to say for certain
> 
> 

-- 
View this message in context: http://www.nabble.com/Lines-not-breaking-with-zero-width-space-characters-inserted-tf4120139.html#a11743406
Sent from the FOP - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Lines not breaking with zero width space characters inserted

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jul 20, 2007, at 20:55, David Kelly wrote:

Hi

> <snip />
> I am using FOP 0.93 in a Windows environment. I'm also using an en- 
> US hyphenation table from OFFO.
>
> What I would like to find out is:
>
> 1. Should the ZWS work, and if so, what might cause it not to work?

Last I heard, someone actually did use ZWSP with 0.93 and there was  
still a tiny problem there since the renderer inserted a placeholder  
glyph # in the result (bug that was recently fixed in FOP Trunk).

That said, the linebreaking should work, IIRC. No immediate idea on  
why it isn't working in your case... Is it possible to post a small  
fragment of your FO that shows the problem? Say one block with ZWSP  
and one block without? That could already help us a lot in figuring  
this out.

> 2. Are there alternatives for creating a line-break after an  
> underscore?  (I tried customizing a hyphenation pattern in
> the en_US.xml hyphenation file and pointing the cfg.xml  
> <hyphenation_base> to point to its directory, but this did not
> appear to work for me either - output shows the hyphenation file is  
> being read, but the new pattern apparently is not
> being processed.  I tried changing the hyphen character in the en- 
> US.xml file, and that did not get picked up either.

WRT the patterns: how exactly do you alter them? Do note that in  
principle you can only mark the favorability of a hyphenation point,  
and that is not an absolute...

Also, I recently noticed that if you would supply the following  
sequences to FOP's hyphenation

AAA,BBB,CCC,DDD
AAA1BBB2CCC3DDD

You don't get /any/ hyphenation points. The hyphenator gives up once  
it encounters the first comma or digit. Maybe a similar thing is  
happening with underscores... I'd have to check to say for certain

As for the hyphen character, that looks 'normal':
I don't think FOP currently takes into account anything else than a  
specified "hyphenation-character" property, so changing that in the  
en-US.xml file won't have any effect.



Cheers

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org