You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Carlos Villegas <ca...@uniscope.co.jp> on 2000/09/07 13:46:51 UTC

Hyphenation, multilanguage support, and line breaking

Hi,

Regarding the recent messages about multi-language support and
hyphenation.

I've been working for quite a while (my spare time is very limited)
in a hyphenation algorithm.
This is what I've done so far:

 - implemented Liang's (TeX) hyphenation algorithm, as described in
   the TeXBook appendix H, in java (I had a C++ version that 
   I've been porting to java since I heard about FOP)

 - created a simple XML DTD and a XML reader class to load 
   hyphenation patterns from a file

 - converted some of TeX's (or more accuratly Lout's, which come from
TeX)
   hyphenation files to this format

 - basically, a have a method "String hyphenate(String word)" in my 
   HyphenationTree class that currently returns the hyphenated word
   with the hyphens inserted at the proper places (e.g "hy-phen-ation")
   This is for testing, we need whatever the line breaking algorithm
   requires.
 
 - the algorithm was verified against TeX's and produces the same
results.

Things to do:

 - integrate it into FOP. I haven't really studied the line breaking
process
   in FOP, just a quick look at the LineArea class. I saw some comments
in this
   list about that needing some major rework.

 - write a method to dump the HyphenationTree class to something faster
to load
   (although I don't see the process of loading the patterns from an xml
file
    to be slow, just both TeX and Lout load a binary dump in production
runs)

 - I want to implement something equivalent to TeX's \discretionary so
that we
   can support words that change spelling when split (like in German). I
thought
   about adding them to the exception list in the pattern file, but I
don't know
   how common they are in languages with that 'feature'. If there are
not so
   common, the exception list will do fine, with some way of tagging the
full
   "discretionary" hyphens.

I have a "hyphenation" package with about 4,000 lines of code so far.
What's the
proper way to submit such a big patch, a file attachment?.

As soon as I clean up the hyphenation classes, I want to start work on
the
line breaking algorithm. For what I saw, FOP uses a very simple
approach, basically
if a word doesn't fit, it's sent to the next line, TeX sees the whole
paragraph
and determines the best breaks possible, lines at the end of the
paragraph may
affect breaks at the beginning. If possible I want to try to implement
some of
TeX's concepts. I'm in the process of gathering information about line
breaking,
and I came across a reference to a Knuth's article about the subject
that I haven't
been able to obtain. Maybe somebody in this list can help and send me a
copy:

Knuth, D.E. and Plass, M.F. (1981). Breaking paragraphs into lines.
Software
   practice and experience, 11, 1119-1184.

If somebody else is working on these areas (hyphenation, linebreaking)
or have
plans or ideas to share please let me know. I know the overall
priorities for
FOP at the moment don't cover it, but some of us can be working on other
"less important" problems like these.

As I said my time is very limited, and I still several weeks aways
before I
submit anything. Just to let you know, something is moving :-)


Atentamente,


Carlos Villegas

Re: Hyphenation, multilanguage support, and line breaking

Posted by Dave Pawson <da...@dpawson.freeserve.co.uk>.
At 09:55 PM 9/7/00 +0200, you wrote:
>Carlos Villegas wrote:
>
> > As I said my time is very limited, and I still several weeks aways
> > before I
> > submit anything. Just to let you know, something is moving :-)
>
>How cool! Thanks a lot for doing this, Carlos, FOP is a step closer to
>replacing TeX? ;-)
>
>hey, Sebastian, don't get pissed ok? just teasing :)


It would take an earthquake to make Sebastian even consider
anything being as good as TeX :-)

Regards DaveP


Still hoping for some helpful error message??



Re: Hyphenation, multilanguage support, and line breaking

Posted by Sebastian Rahtz <se...@computing-services.oxford.ac.uk>.
Stephan Albers writes:
 > What about NTS (New Typesetting System) a rewrite of Tex in Java?
 >   http://nts.tug.org/
 > They have come a long way, so may be they have an interesting code
 > foundatino.

They have re-implemented TeX *exactly* in java, no more no less. It runs much
much slower. And it has not been released. I wish I could believe it
could provide stuff to FOP, but I am not very convinced

 > OOOHH: I just found that NTS doesn't support Hyphenation yet, so it
 > would make sense to point them to Carlos's hyphenation.

eh? NTS is an exact clone of TeX, they have must have done it by
now. the page you cite is from July - the demo they did in Oxford in
August was supposed to be 100% complete

Sebastian


Re: Hyphenation, multilanguage support, and line breaking

Posted by Stephan Albers <St...@jcatalog.com>.
Sebastian Rahtz schrieb:
> no, seriously, what Carlos describes sounds really excellent. exactly
> what we should be doing, taking the TeX algorithms and re-using
> them. his message was the best news I have seen for ages.

What about NTS (New Typesetting System) a rewrite of Tex in Java?
  http://nts.tug.org/
They have come a long way, so may be they have an interesting code
foundatino.

I don't know about the license, but that could also be nice starting
point for hyphenation. I am not sure if Carlos is aware of this.

OOOHH: I just found that NTS doesn't support Hyphenation yet, so it
would make sense to point them to Carlos's hyphenation.

Stephan
(Back from Holydays in Poland)

Re: Hyphenation, multilanguage support, and line breaking

Posted by Sebastian Rahtz <se...@computing-services.oxford.ac.uk>.
Stefano Mazzocchi writes:
 > 
 > How cool! Thanks a lot for doing this, Carlos, FOP is a step closer to
 > replacing TeX? ;-)
 > 
 > hey, Sebastian, don't get pissed ok? just teasing :)

no, seriously, what Carlos describes sounds really excellent. exactly
what we should be doing, taking the TeX algorithms and re-using
them. his message was the best news I have seen for ages.

Sebastian


Re: Hyphenation, multilanguage support, and line breaking

Posted by Stefano Mazzocchi <st...@apache.org>.
Carlos Villegas wrote:

> As I said my time is very limited, and I still several weeks aways
> before I
> submit anything. Just to let you know, something is moving :-)

How cool! Thanks a lot for doing this, Carlos, FOP is a step closer to
replacing TeX? ;-)

hey, Sebastian, don't get pissed ok? just teasing :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Re: Hyphenation, multilanguage support, and line breaking

Posted by Eric SCHAEFFER <es...@posterconseil.com>.
It would be really great !

Eric.

----- Original Message ----- 
From: "Carlos Villegas" <ca...@uniscope.co.jp>
To: <fo...@xml.apache.org>
Sent: Thursday, September 07, 2000 1:46 PM
Subject: Hyphenation, multilanguage support, and line breaking


> Hi,
> 
> Regarding the recent messages about multi-language support and
> hyphenation.
> 
> I've been working for quite a while (my spare time is very limited)
> in a hyphenation algorithm.
> This is what I've done so far:
> 
>  - implemented Liang's (TeX) hyphenation algorithm, as described in
>    the TeXBook appendix H, in java (I had a C++ version that 
>    I've been porting to java since I heard about FOP)
> 
>  - created a simple XML DTD and a XML reader class to load 
>    hyphenation patterns from a file
> 
>  - converted some of TeX's (or more accuratly Lout's, which come from
> TeX)
>    hyphenation files to this format
> 
>  - basically, a have a method "String hyphenate(String word)" in my 
>    HyphenationTree class that currently returns the hyphenated word
>    with the hyphens inserted at the proper places (e.g "hy-phen-ation")
>    This is for testing, we need whatever the line breaking algorithm
>    requires.
>  
>  - the algorithm was verified against TeX's and produces the same
> results.
> 
> Things to do:
> 
>  - integrate it into FOP. I haven't really studied the line breaking
> process
>    in FOP, just a quick look at the LineArea class. I saw some comments
> in this
>    list about that needing some major rework.
> 
>  - write a method to dump the HyphenationTree class to something faster
> to load
>    (although I don't see the process of loading the patterns from an xml
> file
>     to be slow, just both TeX and Lout load a binary dump in production
> runs)
> 
>  - I want to implement something equivalent to TeX's \discretionary so
> that we
>    can support words that change spelling when split (like in German). I
> thought
>    about adding them to the exception list in the pattern file, but I
> don't know
>    how common they are in languages with that 'feature'. If there are
> not so
>    common, the exception list will do fine, with some way of tagging the
> full
>    "discretionary" hyphens.
> 
> I have a "hyphenation" package with about 4,000 lines of code so far.
> What's the
> proper way to submit such a big patch, a file attachment?.
> 
> As soon as I clean up the hyphenation classes, I want to start work on
> the
> line breaking algorithm. For what I saw, FOP uses a very simple
> approach, basically
> if a word doesn't fit, it's sent to the next line, TeX sees the whole
> paragraph
> and determines the best breaks possible, lines at the end of the
> paragraph may
> affect breaks at the beginning. If possible I want to try to implement
> some of
> TeX's concepts. I'm in the process of gathering information about line
> breaking,
> and I came across a reference to a Knuth's article about the subject
> that I haven't
> been able to obtain. Maybe somebody in this list can help and send me a
> copy:
> 
> Knuth, D.E. and Plass, M.F. (1981). Breaking paragraphs into lines.
> Software
>    practice and experience, 11, 1119-1184.
> 
> If somebody else is working on these areas (hyphenation, linebreaking)
> or have
> plans or ideas to share please let me know. I know the overall
> priorities for
> FOP at the moment don't cover it, but some of us can be working on other
> "less important" problems like these.
> 
> As I said my time is very limited, and I still several weeks aways
> before I
> submit anything. Just to let you know, something is moving :-)
> 
> 
> Atentamente,
> 
> 
> Carlos Villegas
> 


Re: Hyphenation, multilanguage support, and line breaking

Posted by Ar...@chebucto.ns.ca.
Quoting Fotis Jannidis <fo...@lrz.uni-muenchen.de>:

> From:           	Carlos Villegas <ca...@uniscope.co.jp>
> 
> > I have a "hyphenation" package with about 4,000 lines of code so far.
> > What's the
> > proper way to submit such a big patch, a file attachment?.
> 
> with such an amount of new code, I think the best approach would 
> be to ask on this list for a committer and send him/her your patch.
> 
> [...]
> > As I said my time is very limited, and I still several weeks aways
> > before I
> > submit anything. Just to let you know, something is moving :-)
> 
> Great news! You will make many people very happy :-)
> 
> Fotis
> 

I'll second that. :-) I'll see if I can't dig up that Knuth paper. BTW, GNU 
textutils uses that algorithm; I wonder if you can't just locate the source.

Arved Sandstrom

> 


---------------------------------------------------------------
 This mail was sent through the Nova Scotia Provincial Server, 
 with technical resources provided by Chebucto Community Net.
 http://nsaccess.ns.ca/mail/         http://www.chebucto.ns.ca/


Re: Hyphenation, multilanguage support, and line breaking

Posted by Fotis Jannidis <fo...@lrz.uni-muenchen.de>.
From:           	Carlos Villegas <ca...@uniscope.co.jp>

> I have a "hyphenation" package with about 4,000 lines of code so far.
> What's the
> proper way to submit such a big patch, a file attachment?.

with such an amount of new code, I think the best approach would 
be to ask on this list for a committer and send him/her your patch.

[...]
> As I said my time is very limited, and I still several weeks aways
> before I
> submit anything. Just to let you know, something is moving :-)

Great news! You will make many people very happy :-)

Fotis