You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Glenn Adams <gl...@skynav.com> on 2013/06/18 06:46:48 UTC

FOP using ICU?

Is there a reason FOP doesn't use ICU for determining line break
boundaries? The FOP implementation of UAX14 (org.apache.fop.text.linebreak)
seems to be out of date and basically unmaintained. According to [1], a
number of Apache projects are using it, including PDFBox, Xalan, and Xerces.

[1] http://site.icu-project.org/#TOC-Apache-Projects

Re: FOP using ICU?

Posted by Glenn Adams <gl...@skynav.com>.
By interoperability, I mean interoperability with different language line
breaking requirements. For example, Thai (and a number of other languages)
requires dictionary based support for line breaking. ICU supports this
today, while it is highly unlikely we would ever add this support to the
existing FOP implementation of UAX14.


On Tue, Jun 18, 2013 at 6:04 PM, Glenn Adams <gl...@skynav.com> wrote:

> My position is that it is costing us in interoperability (I mean lack
> thereof) by failing to use ICU. I don't see any issue about size.
>
>
> On Tue, Jun 18, 2013 at 6:00 PM, Vincent Hennebert <vh...@gmail.com>wrote:
>
>> On 18/06/13 06:46, Glenn Adams wrote:
>> > Is there a reason FOP doesn't use ICU for determining line break
>> > boundaries? The FOP implementation of UAX14
>> (org.apache.fop.text.linebreak)
>> > seems to be out of date and basically unmaintained. According to [1], a
>> > number of Apache projects are using it, including PDFBox, Xalan, and
>> Xerces.
>>
>> I think the main reason in the past has been the size of the ICU4J jar
>> compared to FOP’s own jar:
>> http://markmail.org/thread/krkqlircefpuxlse
>>
>> I guess the topic could be revisited today. We could consider adding it
>> as an optional dependency, or acknowledge that full Unicode support is
>> taken for granted nowadays and use it by default.
>>
>> >
>> > [1] http://site.icu-project.org/#TOC-Apache-Projects
>> >
>>
>> Vincent
>>
>
>

Re: FOP using ICU?

Posted by Vincent Hennebert <vh...@gmail.com>.
On 19/06/13 10:33, Glenn Adams wrote:
> On Wed, Jun 19, 2013 at 4:00 PM, Vincent Hennebert <vh...@gmail.com>wrote:
> 
>> On 19/06/13 02:40, Glenn Adams wrote:
>>> On Wed, Jun 19, 2013 at 1:05 AM, Chris Bowditch
>>> <bo...@hotmail.com>wrote:
>>>
>>>> Hi Glenn,
>>>>
>>>>
>>>> On 18/06/2013 11:12, Glenn Adams wrote:
>>>>
>>>>> To be more clear, I propose we replace FOP's implementation of UAX14
>> with
>>>>> use of ICU's line break iterator, and that ICU becomes a standard
>>>>> dependency for FOP.
>>>>>
>>>>> However, before taking a decision on this, allow me to create a branch
>>>>> (on github) that actually makes this change so that folks can evaluate
>> it.
>>>>> Is that a reasonable approach?
>>>>>
>>>>
>>>> +1, but I think we should create the branch in SVN as Vincent mentioned.
>>>> SVN is still the Version Control system used by the XML Graphics
>> project.
>>>>
>>>
>>> When I've completed the work in a github branch of a fork of the
>> apache/fop
>>> mirror, I'll upload it to an SVN branch for folks to review who prefer
>> SVN.
>>> Me, I've stopped using SVN for active development. Github is just so much
>>> better.
>>
>> By getting involved in FOP development you accept the project’s
>> practices... including a VCS you may not like :-)
>>
> 
> Just for you Vincent, I'll transfer whatever I do in github to an SVN temp
> branch when I'm ready for you to review. Others will be free to review in
> github as they prefer.

You will not be doing that for me. You will be doing that because this
is more in line (IMO) with the practices of an Apache project. I for one
prefer Git too.


>> Or, if you feel strongly enough about it, you can promote a switch to
>> Git, now that it’s officially supported by Apache.
>>
> 
> Perhaps I will do so when I have time to do the work of a switchover should
> such a proposal pass. In the mean time, I am not proposing a switch.
> 
> 
> 
>> That said, have you tried playing with git svn?
>>
> 
> Yes, I use it for Webkit development. However, it is still a pain over a
> pure git approach.
> 
> 
>>
>> FWIW, I have a Git repo that sits on top of the svn checkout and, after
>> ironing out some wrinkles, I find that the process works just fine.
>>
>> Vincent


Vincent

Re: FOP using ICU?

Posted by Glenn Adams <gl...@skynav.com>.
On Wed, Jun 19, 2013 at 4:00 PM, Vincent Hennebert <vh...@gmail.com>wrote:

> On 19/06/13 02:40, Glenn Adams wrote:
> > On Wed, Jun 19, 2013 at 1:05 AM, Chris Bowditch
> > <bo...@hotmail.com>wrote:
> >
> >> Hi Glenn,
> >>
> >>
> >> On 18/06/2013 11:12, Glenn Adams wrote:
> >>
> >>> To be more clear, I propose we replace FOP's implementation of UAX14
> with
> >>> use of ICU's line break iterator, and that ICU becomes a standard
> >>> dependency for FOP.
> >>>
> >>> However, before taking a decision on this, allow me to create a branch
> >>> (on github) that actually makes this change so that folks can evaluate
> it.
> >>> Is that a reasonable approach?
> >>>
> >>
> >> +1, but I think we should create the branch in SVN as Vincent mentioned.
> >> SVN is still the Version Control system used by the XML Graphics
> project.
> >>
> >
> > When I've completed the work in a github branch of a fork of the
> apache/fop
> > mirror, I'll upload it to an SVN branch for folks to review who prefer
> SVN.
> > Me, I've stopped using SVN for active development. Github is just so much
> > better.
>
> By getting involved in FOP development you accept the project’s
> practices... including a VCS you may not like :-)
>

Just for you Vincent, I'll transfer whatever I do in github to an SVN temp
branch when I'm ready for you to review. Others will be free to review in
github as they prefer.


>
> Or, if you feel strongly enough about it, you can promote a switch to
> Git, now that it’s officially supported by Apache.
>

Perhaps I will do so when I have time to do the work of a switchover should
such a proposal pass. In the mean time, I am not proposing a switch.



> That said, have you tried playing with git svn?
>

Yes, I use it for Webkit development. However, it is still a pain over a
pure git approach.


>
> FWIW, I have a Git repo that sits on top of the svn checkout and, after
> ironing out some wrinkles, I find that the process works just fine.
>
> Vincent
>

Re: FOP using ICU?

Posted by Vincent Hennebert <vh...@gmail.com>.
On 19/06/13 02:40, Glenn Adams wrote:
> On Wed, Jun 19, 2013 at 1:05 AM, Chris Bowditch
> <bo...@hotmail.com>wrote:
> 
>> Hi Glenn,
>>
>>
>> On 18/06/2013 11:12, Glenn Adams wrote:
>>
>>> To be more clear, I propose we replace FOP's implementation of UAX14 with
>>> use of ICU's line break iterator, and that ICU becomes a standard
>>> dependency for FOP.
>>>
>>> However, before taking a decision on this, allow me to create a branch
>>> (on github) that actually makes this change so that folks can evaluate it.
>>> Is that a reasonable approach?
>>>
>>
>> +1, but I think we should create the branch in SVN as Vincent mentioned.
>> SVN is still the Version Control system used by the XML Graphics project.
>>
> 
> When I've completed the work in a github branch of a fork of the apache/fop
> mirror, I'll upload it to an SVN branch for folks to review who prefer SVN.
> Me, I've stopped using SVN for active development. Github is just so much
> better.

By getting involved in FOP development you accept the project’s
practices... including a VCS you may not like :-)

Or, if you feel strongly enough about it, you can promote a switch to
Git, now that it’s officially supported by Apache.

That said, have you tried playing with git svn?

FWIW, I have a Git repo that sits on top of the svn checkout and, after
ironing out some wrinkles, I find that the process works just fine.

Vincent

Re: FOP using ICU?

Posted by Glenn Adams <gl...@skynav.com>.
On Wed, Jun 19, 2013 at 1:05 AM, Chris Bowditch
<bo...@hotmail.com>wrote:

> Hi Glenn,
>
>
> On 18/06/2013 11:12, Glenn Adams wrote:
>
>> To be more clear, I propose we replace FOP's implementation of UAX14 with
>> use of ICU's line break iterator, and that ICU becomes a standard
>> dependency for FOP.
>>
>> However, before taking a decision on this, allow me to create a branch
>> (on github) that actually makes this change so that folks can evaluate it.
>> Is that a reasonable approach?
>>
>
> +1, but I think we should create the branch in SVN as Vincent mentioned.
> SVN is still the Version Control system used by the XML Graphics project.
>

When I've completed the work in a github branch of a fork of the apache/fop
mirror, I'll upload it to an SVN branch for folks to review who prefer SVN.
Me, I've stopped using SVN for active development. Github is just so much
better.

Re: FOP using ICU?

Posted by Chris Bowditch <bo...@hotmail.com>.
Hi Glenn,

On 18/06/2013 11:12, Glenn Adams wrote:
> To be more clear, I propose we replace FOP's implementation of UAX14 
> with use of ICU's line break iterator, and that ICU becomes a standard 
> dependency for FOP.
>
> However, before taking a decision on this, allow me to create a branch 
> (on github) that actually makes this change so that folks can evaluate 
> it. Is that a reasonable approach?

+1, but I think we should create the branch in SVN as Vincent mentioned. 
SVN is still the Version Control system used by the XML Graphics project.

Thanks,

Chris

>
>
> On Tue, Jun 18, 2013 at 6:04 PM, Glenn Adams <glenn@skynav.com 
> <ma...@skynav.com>> wrote:
>
>     My position is that it is costing us in interoperability (I mean
>     lack thereof) by failing to use ICU. I don't see any issue about size.
>
>
>     On Tue, Jun 18, 2013 at 6:00 PM, Vincent Hennebert
>     <vhennebert@gmail.com <ma...@gmail.com>> wrote:
>
>         On 18/06/13 06:46, Glenn Adams wrote:
>         > Is there a reason FOP doesn't use ICU for determining line break
>         > boundaries? The FOP implementation of UAX14
>         (org.apache.fop.text.linebreak)
>         > seems to be out of date and basically unmaintained.
>         According to [1], a
>         > number of Apache projects are using it, including PDFBox,
>         Xalan, and Xerces.
>
>         I think the main reason in the past has been the size of the
>         ICU4J jar
>         compared to FOP’s own jar:
>         http://markmail.org/thread/krkqlircefpuxlse
>
>         I guess the topic could be revisited today. We could consider
>         adding it
>         as an optional dependency, or acknowledge that full Unicode
>         support is
>         taken for granted nowadays and use it by default.
>
>         >
>         > [1] http://site.icu-project.org/#TOC-Apache-Projects
>         >
>
>         Vincent
>
>
>


Re: FOP using ICU?

Posted by Vincent Hennebert <vh...@gmail.com>.
On 18/06/13 12:12, Glenn Adams wrote:
> To be more clear, I propose we replace FOP's implementation of UAX14 with
> use of ICU's line break iterator, and that ICU becomes a standard
> dependency for FOP.
> 
> However, before taking a decision on this, allow me to create a branch (on
> github) that actually makes this change so that folks can evaluate it. Is
> that a reasonable approach?

Sure, although creating a branch on Subversion would probably be
preferable.


> 
> On Tue, Jun 18, 2013 at 6:04 PM, Glenn Adams <gl...@skynav.com> wrote:
> 
>> My position is that it is costing us in interoperability (I mean lack
>> thereof) by failing to use ICU. I don't see any issue about size.
>>
>>
>> On Tue, Jun 18, 2013 at 6:00 PM, Vincent Hennebert <vh...@gmail.com>wrote:
>>
>>> On 18/06/13 06:46, Glenn Adams wrote:
>>>> Is there a reason FOP doesn't use ICU for determining line break
>>>> boundaries? The FOP implementation of UAX14
>>> (org.apache.fop.text.linebreak)
>>>> seems to be out of date and basically unmaintained. According to [1], a
>>>> number of Apache projects are using it, including PDFBox, Xalan, and
>>> Xerces.
>>>
>>> I think the main reason in the past has been the size of the ICU4J jar
>>> compared to FOP’s own jar:
>>> http://markmail.org/thread/krkqlircefpuxlse
>>>
>>> I guess the topic could be revisited today. We could consider adding it
>>> as an optional dependency, or acknowledge that full Unicode support is
>>> taken for granted nowadays and use it by default.
>>>
>>>>
>>>> [1] http://site.icu-project.org/#TOC-Apache-Projects
>>>>
>>>

Vincent

Re: FOP using ICU?

Posted by Glenn Adams <gl...@skynav.com>.
To be more clear, I propose we replace FOP's implementation of UAX14 with
use of ICU's line break iterator, and that ICU becomes a standard
dependency for FOP.

However, before taking a decision on this, allow me to create a branch (on
github) that actually makes this change so that folks can evaluate it. Is
that a reasonable approach?


On Tue, Jun 18, 2013 at 6:04 PM, Glenn Adams <gl...@skynav.com> wrote:

> My position is that it is costing us in interoperability (I mean lack
> thereof) by failing to use ICU. I don't see any issue about size.
>
>
> On Tue, Jun 18, 2013 at 6:00 PM, Vincent Hennebert <vh...@gmail.com>wrote:
>
>> On 18/06/13 06:46, Glenn Adams wrote:
>> > Is there a reason FOP doesn't use ICU for determining line break
>> > boundaries? The FOP implementation of UAX14
>> (org.apache.fop.text.linebreak)
>> > seems to be out of date and basically unmaintained. According to [1], a
>> > number of Apache projects are using it, including PDFBox, Xalan, and
>> Xerces.
>>
>> I think the main reason in the past has been the size of the ICU4J jar
>> compared to FOP’s own jar:
>> http://markmail.org/thread/krkqlircefpuxlse
>>
>> I guess the topic could be revisited today. We could consider adding it
>> as an optional dependency, or acknowledge that full Unicode support is
>> taken for granted nowadays and use it by default.
>>
>> >
>> > [1] http://site.icu-project.org/#TOC-Apache-Projects
>> >
>>
>> Vincent
>>
>
>

Re: FOP using ICU?

Posted by Glenn Adams <gl...@skynav.com>.
My position is that it is costing us in interoperability (I mean lack
thereof) by failing to use ICU. I don't see any issue about size.


On Tue, Jun 18, 2013 at 6:00 PM, Vincent Hennebert <vh...@gmail.com>wrote:

> On 18/06/13 06:46, Glenn Adams wrote:
> > Is there a reason FOP doesn't use ICU for determining line break
> > boundaries? The FOP implementation of UAX14
> (org.apache.fop.text.linebreak)
> > seems to be out of date and basically unmaintained. According to [1], a
> > number of Apache projects are using it, including PDFBox, Xalan, and
> Xerces.
>
> I think the main reason in the past has been the size of the ICU4J jar
> compared to FOP’s own jar:
> http://markmail.org/thread/krkqlircefpuxlse
>
> I guess the topic could be revisited today. We could consider adding it
> as an optional dependency, or acknowledge that full Unicode support is
> taken for granted nowadays and use it by default.
>
> >
> > [1] http://site.icu-project.org/#TOC-Apache-Projects
> >
>
> Vincent
>

Re: FOP using ICU?

Posted by Vincent Hennebert <vh...@gmail.com>.
On 18/06/13 06:46, Glenn Adams wrote:
> Is there a reason FOP doesn't use ICU for determining line break
> boundaries? The FOP implementation of UAX14 (org.apache.fop.text.linebreak)
> seems to be out of date and basically unmaintained. According to [1], a
> number of Apache projects are using it, including PDFBox, Xalan, and Xerces.

I think the main reason in the past has been the size of the ICU4J jar
compared to FOP’s own jar:
http://markmail.org/thread/krkqlircefpuxlse

I guess the topic could be revisited today. We could consider adding it
as an optional dependency, or acknowledge that full Unicode support is
taken for granted nowadays and use it by default.

> 
> [1] http://site.icu-project.org/#TOC-Apache-Projects
> 

Vincent