You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2007/07/09 20:11:23 UTC

DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42162>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42162





------- Additional Comments From a_l.delmelle@pandora.be  2007-07-09 11:11 -------
I think I see the problem here, but I'm not sure it's a bug... Not all of it, that is.
Hyphenation is, in fact, only applicable to pure alphabetical characters. Strictly speaking, one cannot 
'hyphenate' a seven-digit number, if I interpret correctly.

That said, the case with the comma maybe could be handled better. Currently, from the point on where 
the hyphenator meets the comma (or more generally: any non-letter), it does not even attempt to 
hyphenate anymore.

What you're really looking for is unconditional wrapping of the text, rather than hyphenation, it seems. 
That would be wrap-option="wrap" on the blocks, which we claim to support according to the compliance 
page. After having a quick try, this feature seems to be broken, however...

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Re: DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Andreas L Delmelle wrote:
> Seems to me the reporter is wrong to expect that sequence of 80+ digits 
> to be hyphenated under any circumstance, and even the comma-case... Easy 
> enough to come up with such oddities, but when would you ever really 
> need that? And more importantly: Is it really hyphenation you would need 
> then?

No, it's more of either wrapping, or alternative line breaking (as
in the case of long URLs). Hyphenation applies to words, and words
contain letters and in some languages also various punctuation
characters.
Nevertheless, giving the user some higher level possibilities
(i.e. other than inserting ZWS) to control wrapping or alternative
line breaking for certain long character sequences where UAX#14
gives unacceptable results is something worth thinking about.
Reusing the hyphenator for this might help keeping the necessary
code short.

J.Pietschmann

Re: DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jul 9, 2007, at 22:30, J.Pietschmann wrote:

> a_l.delmelle wrote in a bugzilla entry:
>> Hyphenation is, in fact, only applicable to pure alphabetical  
>> characters.
>
> Well, no. The pattern based hyphenator can deal with any Unicode
> characters (apart from digits, whitespace and the dot, which have
> a special meaning in the pattern definitions). If the word parser
> would use the character classes from the active pattern file for
> parsing words, basically anything could be used. This would only
> need a proper interface for retrieving the character classes. The
> class canonicalization could even be folded into the parsing process
> for better performance.

OK, I see the possibilities. The fact that digits have this special  
meaning in the patterns does have its reasons, though.
I have yet to encounter a text in which anything was hyphenated but  
words. Dates or timestamps? Digits? Serial numbers? E-mail addresses?  
URLs? Meaningless,ArtificallyGlued-togetherPseudo*Words?
Never seen any of those hyphenated. Wrapped, sometimes, but never  
hyphenated.

Looked around a bit, and combining 'hyphenation' and 'numbers' only  
got me in the direction of hyphenation /of/ numbers when spelled out  
completely --as words.
So what I meant by that statement was: Hyphenation makes sense only  
in the context of written text, as in relation to a dictionary.

Seems to me the reporter is wrong to expect that sequence of 80+  
digits to be hyphenated under any circumstance, and even the comma- 
case... Easy enough to come up with such oddities, but when would you  
ever really need that? And more importantly: Is it really hyphenation  
you would need then?


Cheers

Andreas

Re: DO NOT REPLY [Bug 42162] - hyphenation inside block in FOP works only for pure alphabetical characters

Posted by "J.Pietschmann" <j3...@yahoo.de>.
a_l.delmelle wrote in a bugzilla entry:
> Hyphenation is, in fact, only applicable to pure alphabetical characters.

Well, no. The pattern based hyphenator can deal with any Unicode
characters (apart from digits, whitespace and the dot, which have
a special meaning in the pattern definitions). If the word parser
would use the character classes from the active pattern file for
parsing words, basically anything could be used. This would only
need a proper interface for retrieving the character classes. The
class canonicalization could even be folded into the parsing process
for better performance.

J.Pietschmann