You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2009/08/24 13:44:01 UTC

DO NOT REPLY [Bug 47726] New: Line breaking a word in the Thai language.

https://issues.apache.org/bugzilla/show_bug.cgi?id=47726

           Summary: Line breaking a word in the Thai language.
           Product: Fop
           Version: 0.94
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: critical
          Priority: P2
         Component: pdf
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: ngsonhung@gmail.com


--- Comment #0 from Hung S Nguyen <ng...@gmail.com> 2009-08-24 04:44:00 PDT ---
When exporting a PDF, it can't export exactly the Thai language. Although it
can show the Thai characters, but it  breaks line between a word. I tried to
use attributes relating to white space, but it can't. How could I fix this
issue?

Ex: my fo file:
...
<fo:block font-weight="normal" font-family="Arial MS" line-height="12pt"
font-size="12pt" space-before.optimum="8pt" space-after.optimum="8pt"
start-indent="1cm" end-indent="1cm">1เป็นส่วนผสมที่ละลายได้ทันที
2และไม่จำเป็นต้องใช้กากไก่ในการเตรียม ซึ่งน้ำเกรวี่จะมีลักษณะเนียน
และมีกลิ่นรสของไก่ที่ 5หอมอร่อย 
</fo:block>
...

Thanks
Hung

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47726] Line breaking a word in the Thai language.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47726

--- Comment #5 from J.Pietschmann <j3...@yahoo.de> 2009-11-17 13:20:18 UTC ---
(In reply to comment #4)
> Assumption I use the ICU4J to put the &#x200b; char between the Thai string
> correctly, how could we break line as we expect? 

This should to work. AFAICT Thai letters are mapped to the class AL (ordinary
letter) for line breaking purposes in FOP, which means FOP wouldn't break lines
in Thai text except around the Zero Width Spaces.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47726] Line breaking a word in the Thai language.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47726

--- Comment #4 from Hung S Nguyen <ng...@gmail.com> 2009-11-17 00:51:08 UTC ---
I'm sorry, I was busy with other tasks, I wasn't able to go on. Now, I'm
comming back this issue, I tried to do many ways, I inputed many attributes
about the whitespace and inserted the &#x200b; char between the Thai words,
even I read code, but I still not find any way to break line as I expected.

Assumption I use the ICU4J to put the &#x200b; char between the Thai string
correctly, how could we break line as we expect? 

Do we have attributes that can group the words and break line with the group?
or break line with the whitespaces?

Thank you very much

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47726] Line breaking a word in the Thai language.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47726



--- Comment #1 from Manuel Mall <ma...@apache.org> 2009-08-24 05:05:01 PDT ---
I have no idea how in the Thai language word boundaries are determined but from
your snippet below it appears to me that Thai word boundaries are not indicated
by whitespace. I suggest you try to put a ZWSP (Zero Width Space) &#x200b;
between the Thai characters where there are Thai word boundaries.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47726] Line breaking a word in the Thai language.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47726

--- Comment #7 from Glenn Adams <gl...@skynav.com> 2012-04-07 01:41:59 UTC ---
resetting P2 open bugs to P3 pending further review

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47726] Line breaking a word in the Thai language.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47726

--- Comment #6 from Hung S Nguyen <ng...@gmail.com> 2009-12-03 01:55:47 UTC ---
Created an attachment (id=24664)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24664)
TextLayoutManager.java

I don't think that Thai letters are mapped to the class AL. When you debug in
LineBreakStatus.java --> nextChar(), if you print the currentClass, it will be
30 (SA). SA means South East Asian (http://unicode.org/reports/tr14/#SA). 

If it is SA, it's able to breaks line at any postion of Thai word. In comment
of LineBreakStatus.java, I also see: "* TODO: Better handling for AI, SA, CB
and other line break classes.".

Now, I fixed issue in FOP 0.94 and attached my file changed. Do you agree with
my fix? Please give me your idea. 

Thanks
Hung

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47726] Line breaking a word in the Thai language.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47726



--- Comment #3 from Peter S. Housel <ho...@acm.org> 2009-08-31 10:19:51 PDT ---
(In reply to comment #2)
> The Unicode UAX#14 indicates that proper line breaking for the Thai language
> involves morphological analysis in order to determine word boundaries. The
> standard considered this as too complex and left it to the "higher levels
> of processing".
> The libthai project (http://linux.thai.net/projects/libthai) produces open
> source
> software for this purpose, written in C/C++, which is used by Mozilla, Gnome
> applications and other OSS. Apparently, Java applications aren't as easily
> supported, yet.

The com.ibm.icu.text.ThaiBreakIterator class in recent versions of ICU4J can
supposedly do this. It makes use of an included dictionary of Thai words in
order to locate valid break points.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47726] Line breaking a word in the Thai language.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47726

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|critical                    |normal

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47726] Line breaking a word in the Thai language.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47726

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47726] Line breaking a word in the Thai language.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47726



--- Comment #2 from J.Pietschmann <j3...@yahoo.de> 2009-08-27 12:03:57 PDT ---
The Unicode UAX#14 indicates that proper line breaking for the Tahi language
involves morphological analysis in order to determine word boundaries. The
standard considered this as too complex and left it to the "higher levels
of processing".

The libthai project (http://linux.thai.net/projects/libthai) produces open
source
software for this purpose, written in C/C++, which is used by Mozilla, Gnome
applications and other OSS. Apparently, Java applications aren't as easily
supported, yet.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.