You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2007/04/18 20:38:25 UTC

DO NOT REPLY [Bug 42162] New: - hyphenation inside block in FOP works only for pure alphabetical characters

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42162>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42162

           Summary: hyphenation inside block in FOP works only for pure
                    alphabetical characters
           Product: Fop
           Version: 0.93
          Platform: Other
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: general
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: anuja_gok@yahoo.com


Hyphenate does not work correctly when the data in the block has numeric 
characters or commas...

In the example below, only the data in the first table-row gets hyphenated 
correctly

<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" 
xmlns:datetime="http://exslt.org/dates-and-times" writing-mode="lr-tb" text-
align="start" role="html">
  <fo:layout-master-set>
    <fo:simple-page-master master-name="all-pages" page-width="8.5in" page-
height="11in">
      <fo:region-body margin-top="1in" margin-right="0.25in" margin-
bottom="1.5in" margin-left="   0.25in" page-width="8.5in" page-height="11in"/>
      <fo:region-before region-name="page-header" extent="1in" display-
align="before"/>
      <fo:region-after region-name="page-footer" extent="1.5in" display-
align="after"/>
      <fo:region-start extent="0.25in"/>
      <fo:region-end extent="1.5in"/>
    </fo:simple-page-master>
  </fo:layout-master-set>
  <fo:page-sequence master-reference="all-pages">
    <fo:static-content flow-name="page-header">
      <fo:block font-size="14pt" text-align="center" hyphenate="true" 
language="en" space-before.conditionality="retain" space-before="0.5in"/>
    </fo:static-content>
    <fo:static-content flow-name="page-footer">
      <fo:block font-size="small" text-align="center" hyphenate="true" 
language="en" space-after.conditionality="retain" space-after="0.5in"/>
    </fo:static-content>
    <fo:flow flow-name="xsl-region-body">
      <fo:block hyphenate="true" language="en" role="body">
        <fo:inline>Hyphenation for this table data works - when all the data 
is non numeric</fo:inline>
        <fo:table>
          <fo:table-body>
            <fo:table-row>
              <fo:table-cell>
                <fo:table>
                  <fo:table-body>
                    <fo:table-row>
                      <fo:table-cell border-width="1pt" border-style="solid">
                        <fo:block hyphenate="true" 
language="en">HyphenationOfThisBlockWorksNicelyUnlessWhenThereIsNoCommaOrNumeri
cDataBeforeIt</fo:block>
                      </fo:table-cell>
                      <fo:table-cell><fo:block/></fo:table-cell>
                    </fo:table-row>
                    <fo:table-row>
                      <fo:table-cell border-width="1pt" border-style="solid">
                        <fo:block hyphenate="true" 
language="en">HyphenationOfThisBlockWorksNicely,OnlyForTheBlockBeforeTheCommaIn
TheBlockData</fo:block>
                      </fo:table-cell>
                      <fo:table-cell><fo:block/></fo:table-cell>
                    </fo:table-row>
                    <fo:table-row>
                      <fo:table-cell border-width="1pt" border-style="solid">
                        <fo:block hyphenate="true" 
language="en">12345678901234567890123456789012345678901234567890123456789012345
6789012345678901234567890</fo:block>
                      </fo:table-cell>
                      <fo:table-cell><fo:block/></fo:table-cell>
                    </fo:table-row>
                  </fo:table-body>
                </fo:table>
              </fo:table-cell>
            </fo:table-row>
          </fo:table-body>
        </fo:table>
      </fo:block>
    </fo:flow>
  </fo:page-sequence>
</fo:root>

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42162>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42162





------- Additional Comments From anuja_gok@yahoo.com  2007-04-18 11:41 -------
Created an attachment (id=19993)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=19993&action=view)
This is the fo file that recreates the bug


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Re: DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Andreas L Delmelle wrote:
> Seems to me the reporter is wrong to expect that sequence of 80+ digits 
> to be hyphenated under any circumstance, and even the comma-case... Easy 
> enough to come up with such oddities, but when would you ever really 
> need that? And more importantly: Is it really hyphenation you would need 
> then?

No, it's more of either wrapping, or alternative line breaking (as
in the case of long URLs). Hyphenation applies to words, and words
contain letters and in some languages also various punctuation
characters.
Nevertheless, giving the user some higher level possibilities
(i.e. other than inserting ZWS) to control wrapping or alternative
line breaking for certain long character sequences where UAX#14
gives unacceptable results is something worth thinking about.
Reusing the hyphenator for this might help keeping the necessary
code short.

J.Pietschmann

Re: DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jul 9, 2007, at 22:30, J.Pietschmann wrote:

> a_l.delmelle wrote in a bugzilla entry:
>> Hyphenation is, in fact, only applicable to pure alphabetical  
>> characters.
>
> Well, no. The pattern based hyphenator can deal with any Unicode
> characters (apart from digits, whitespace and the dot, which have
> a special meaning in the pattern definitions). If the word parser
> would use the character classes from the active pattern file for
> parsing words, basically anything could be used. This would only
> need a proper interface for retrieving the character classes. The
> class canonicalization could even be folded into the parsing process
> for better performance.

OK, I see the possibilities. The fact that digits have this special  
meaning in the patterns does have its reasons, though.
I have yet to encounter a text in which anything was hyphenated but  
words. Dates or timestamps? Digits? Serial numbers? E-mail addresses?  
URLs? Meaningless,ArtificallyGlued-togetherPseudo*Words?
Never seen any of those hyphenated. Wrapped, sometimes, but never  
hyphenated.

Looked around a bit, and combining 'hyphenation' and 'numbers' only  
got me in the direction of hyphenation /of/ numbers when spelled out  
completely --as words.
So what I meant by that statement was: Hyphenation makes sense only  
in the context of written text, as in relation to a dictionary.

Seems to me the reporter is wrong to expect that sequence of 80+  
digits to be hyphenated under any circumstance, and even the comma- 
case... Easy enough to come up with such oddities, but when would you  
ever really need that? And more importantly: Is it really hyphenation  
you would need then?


Cheers

Andreas

Re: DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

Posted by "J.Pietschmann" <j3...@yahoo.de>.
a_l.delmelle wrote in a bugzilla entry:
> Hyphenation is, in fact, only applicable to pure alphabetical characters.

Well, no. The pattern based hyphenator can deal with any Unicode
characters (apart from digits, whitespace and the dot, which have
a special meaning in the pattern definitions). If the word parser
would use the character classes from the active pattern file for
parsing words, basically anything could be used. This would only
need a proper interface for retrieving the character classes. The
class canonicalization could even be folded into the parsing process
for better performance.

J.Pietschmann

DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42162>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42162





------- Additional Comments From a_l.delmelle@pandora.be  2007-07-09 11:11 -------
I think I see the problem here, but I'm not sure it's a bug... Not all of it, that is.
Hyphenation is, in fact, only applicable to pure alphabetical characters. Strictly speaking, one cannot 
'hyphenate' a seven-digit number, if I interpret correctly.

That said, the case with the comma maybe could be handled better. Currently, from the point on where 
the hyphenator meets the comma (or more generally: any non-letter), it does not even attempt to 
hyphenate anymore.

What you're really looking for is unconditional wrapping of the text, rather than hyphenation, it seems. 
That would be wrap-option="wrap" on the blocks, which we claim to support according to the compliance 
page. After having a quick try, this feature seems to be broken, however...

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42162>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42162





------- Additional Comments From anuja_gok@yahoo.com  2007-04-18 11:42 -------
Created an attachment (id=19994)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=19994&action=view)
Associate pdf file that gets created when fop is run on the attached fo file


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.