You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Oliver Deakin <ol...@googlemail.com> on 2007/10/01 12:27:44 UTC

[classlib][icu] Bringing ICU level up to 3.8

Hi all,

I have been looking recently at what it would take for us to step up to 
icu4j 3.8 and thought I would give everyone a heads up on what I have 
discovered.

The first thing is that icu4jni is no longer supported from this release 
onwards. The icu4jni api have been incorporated into icu4j and are 
implemented in pure Java now.
Secondly, the Bidi class has also been implemented fully in icu4j now, 
so it is possible for us to also drop icu4c as a dependency and use pure 
icu4j for this functionality.

The major advantage I see of moving to pure icu4j 3.8 is that we no 
longer need to maintain prebuilt binaries of the icu4c and icu4jni 
libraries across all platforms in our repository. This simplifies the 
process of upgrading to new versions of icu and also allows us to move 
to new platforms with greater ease.

I am currently testing a patch to switch over to icu 3.8 and completely 
remove the need for icu4c/jni. I have discovered a couple of bugs in the 
new Bidi functionality [1] which I have raised on the icu dev list and 
are in the process of being fixed. I hope that once they are all 
resolved we will be able to pick up a patched icu4j 3.8 jar for our use.

Im interested to hear if anyone has any comments/objections to this?

Regards,
Oliver

[1]
http://bugs.icu-project.org/trac/ticket/5952
http://bugs.icu-project.org/trac/ticket/5961

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Gregory Shimansky <gs...@gmail.com>.
Oliver Deakin wrote:
> Ilya Berezhniuk wrote:
>> I looked through ICU bugs and did not find related bugs.
>> But it's not first such issue in ICU 3.4 - Gregory some time
>> investigated a crash in ICU on Windows (HARMONY-2669).
>>
>> I've just checked DecodingModesTest2 test from HARMONY-4758 with icu4j
>> 3.8, it works fine on Sun VM and DRL VM.
 >
> That's great Ilya, thanks for checking! I hope that Gregory's issue 
> might also be resolved if we move to icu4j.

If version 3.8 doesn't have native code, it shouldn't have this bug 
since it is in the native code.

>> 2007/10/2, Oliver Deakin <ol...@googlemail.com>:
>>  
>>> Thanks Ilya - has that bug been raised in the ICU bug system? It would
>>> be good to know if it has been fixed in 3.8!
>>>
>>> Regards,
>>> Oliver
>>>
>>> Ilya Berezhniuk wrote:
>>>    
>>>> Oliver, Pavel,
>>>>
>>>> I support this idea!
>>>> I've recently investigated HARMONY-4758, and have found a bug in 
>>>> current ICU.
>>>> I hope some 3.4 bugs will disappear with migrating to newer ICU 
>>>> version.
>>>>
>>>> Ilya
>>>>
>>>> 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
>>>>
>>>>      
>>>>> Pavel Pervov wrote:
>>>>>
>>>>>        
>>>>>>> IMHO it would be good for classlib to move to pure icu4j for the 
>>>>>>> reasons
>>>>>>> I've stated, plus it would mean that we could entirely remove the 
>>>>>>> native
>>>>>>> code from the text module and also the BidiWrapper class, which I 
>>>>>>> see as
>>>>>>> a bonus.
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> I do not object here. :)
>>>>>>
>>>>>> There are still a few bugs to be ironed out in icu4j 3.8 before we 
>>>>>> start
>>>>>>
>>>>>>
>>>>>>          
>>>>>>> using it (unless we are willing to put up with almost all the 
>>>>>>> Bidi tests
>>>>>>> failing, which I don't think would be acceptable) so the 
>>>>>>> transition to
>>>>>>> 3.8 would not happen for a little while yet anyway. When the time 
>>>>>>> came,
>>>>>>> I would think it would make sense to move these libraries to 
>>>>>>> DRLVM as
>>>>>>> they would no longer be dependencies of classlib. Perhaps at that 
>>>>>>> point
>>>>>>> it would also be worth stepping the icu4c libraries up to 3.8 so
>>>>>>> classlib and drlvm are at the same level?
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> I think when we are done with classlib, we file a JIRA on DRLVM 
>>>>>> build to
>>>>>> update ICU4C to the latest version and to move it to DRLVM 
>>>>>> dependencies.
>>>>>>
>>>>>>
>>>>>>           
>>>>> Sounds like the right thing to do, thanks Pavel.
>>>>>
>>>>> Regards,
>>>>> Oliver
>>>>>
>>>>>
>>>>>        
>>>>>> Regards,
>>>>>>
>>>>>>
>>>>>>          
>>>>>>> Oliver
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> WBR,
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>>>> -- 
>>>>> Oliver Deakin
>>>>> Unless stated otherwise above:
>>>>> IBM United Kingdom Limited - Registered in England and Wales with 
>>>>> number 741598.
>>>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
>>>>> PO6 3AU
>>>>>
>>>>>
>>>>>
>>>>>         
>>>>       
>>> -- 
>>> Oliver Deakin
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with 
>>> number 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
>>> PO6 3AU
>>>
>>>
>>>     
> 


-- 
Gregory


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Ilya Berezhniuk wrote:
> I looked through ICU bugs and did not find related bugs.
> But it's not first such issue in ШСГ 3.4 - Gregory some time
> investigated a crash in ICU on Windows (HARMONY-2669).
>
> I've just checked DecodingModesTest2 test from HARMONY-4758 with icu4j
> 3.8, it works fine on Sun VM and DRL VM.
>   

That's great Ilya, thanks for checking! I hope that Gregory's issue 
might also be resolved if we move to icu4j.

Regards,
Oliver

> Ilya
>
> 2007/10/2, Oliver Deakin <ol...@googlemail.com>:
>   
>> Thanks Ilya - has that bug been raised in the ICU bug system? It would
>> be good to know if it has been fixed in 3.8!
>>
>> Regards,
>> Oliver
>>
>> Ilya Berezhniuk wrote:
>>     
>>> Oliver, Pavel,
>>>
>>> I support this idea!
>>> I've recently investigated HARMONY-4758, and have found a bug in current ICU.
>>> I hope some 3.4 bugs will disappear with migrating to newer ICU version.
>>>
>>> Ilya
>>>
>>> 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
>>>
>>>       
>>>> Pavel Pervov wrote:
>>>>
>>>>         
>>>>>> IMHO it would be good for classlib to move to pure icu4j for the reasons
>>>>>> I've stated, plus it would mean that we could entirely remove the native
>>>>>> code from the text module and also the BidiWrapper class, which I see as
>>>>>> a bonus.
>>>>>>
>>>>>>
>>>>>>             
>>>>> I do not object here. :)
>>>>>
>>>>> There are still a few bugs to be ironed out in icu4j 3.8 before we start
>>>>>
>>>>>
>>>>>           
>>>>>> using it (unless we are willing to put up with almost all the Bidi tests
>>>>>> failing, which I don't think would be acceptable) so the transition to
>>>>>> 3.8 would not happen for a little while yet anyway. When the time came,
>>>>>> I would think it would make sense to move these libraries to DRLVM as
>>>>>> they would no longer be dependencies of classlib. Perhaps at that point
>>>>>> it would also be worth stepping the icu4c libraries up to 3.8 so
>>>>>> classlib and drlvm are at the same level?
>>>>>>
>>>>>>
>>>>>>             
>>>>> I think when we are done with classlib, we file a JIRA on DRLVM build to
>>>>> update ICU4C to the latest version and to move it to DRLVM dependencies.
>>>>>
>>>>>
>>>>>           
>>>> Sounds like the right thing to do, thanks Pavel.
>>>>
>>>> Regards,
>>>> Oliver
>>>>
>>>>
>>>>         
>>>>> Regards,
>>>>>
>>>>>
>>>>>           
>>>>>> Oliver
>>>>>>
>>>>>>
>>>>>>             
>>>>> WBR,
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> --
>>>> Oliver Deakin
>>>> Unless stated otherwise above:
>>>> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
>>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>>>
>>>>
>>>>
>>>>         
>>>       
>> --
>> Oliver Deakin
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>>
>>     

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Ilya Berezhniuk <il...@gmail.com>.
I looked through ICU bugs and did not find related bugs.
But it's not first such issue in ШСГ 3.4 - Gregory some time
investigated a crash in ICU on Windows (HARMONY-2669).

I've just checked DecodingModesTest2 test from HARMONY-4758 with icu4j
3.8, it works fine on Sun VM and DRL VM.

Ilya

2007/10/2, Oliver Deakin <ol...@googlemail.com>:
> Thanks Ilya - has that bug been raised in the ICU bug system? It would
> be good to know if it has been fixed in 3.8!
>
> Regards,
> Oliver
>
> Ilya Berezhniuk wrote:
> > Oliver, Pavel,
> >
> > I support this idea!
> > I've recently investigated HARMONY-4758, and have found a bug in current ICU.
> > I hope some 3.4 bugs will disappear with migrating to newer ICU version.
> >
> > Ilya
> >
> > 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
> >
> >> Pavel Pervov wrote:
> >>
> >>>> IMHO it would be good for classlib to move to pure icu4j for the reasons
> >>>> I've stated, plus it would mean that we could entirely remove the native
> >>>> code from the text module and also the BidiWrapper class, which I see as
> >>>> a bonus.
> >>>>
> >>>>
> >>> I do not object here. :)
> >>>
> >>> There are still a few bugs to be ironed out in icu4j 3.8 before we start
> >>>
> >>>
> >>>> using it (unless we are willing to put up with almost all the Bidi tests
> >>>> failing, which I don't think would be acceptable) so the transition to
> >>>> 3.8 would not happen for a little while yet anyway. When the time came,
> >>>> I would think it would make sense to move these libraries to DRLVM as
> >>>> they would no longer be dependencies of classlib. Perhaps at that point
> >>>> it would also be worth stepping the icu4c libraries up to 3.8 so
> >>>> classlib and drlvm are at the same level?
> >>>>
> >>>>
> >>> I think when we are done with classlib, we file a JIRA on DRLVM build to
> >>> update ICU4C to the latest version and to move it to DRLVM dependencies.
> >>>
> >>>
> >> Sounds like the right thing to do, thanks Pavel.
> >>
> >> Regards,
> >> Oliver
> >>
> >>
> >>> Regards,
> >>>
> >>>
> >>>> Oliver
> >>>>
> >>>>
> >>> WBR,
> >>>
> >>>
> >>>
> >> --
> >> Oliver Deakin
> >> Unless stated otherwise above:
> >> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> >>
> >>
> >>
> >
> >
>
> --
> Oliver Deakin
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Thanks Ilya - has that bug been raised in the ICU bug system? It would 
be good to know if it has been fixed in 3.8!

Regards,
Oliver

Ilya Berezhniuk wrote:
> Oliver, Pavel,
>
> I support this idea!
> I've recently investigated HARMONY-4758, and have found a bug in current ICU.
> I hope some 3.4 bugs will disappear with migrating to newer ICU version.
>
> Ilya
>
> 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
>   
>> Pavel Pervov wrote:
>>     
>>>> IMHO it would be good for classlib to move to pure icu4j for the reasons
>>>> I've stated, plus it would mean that we could entirely remove the native
>>>> code from the text module and also the BidiWrapper class, which I see as
>>>> a bonus.
>>>>
>>>>         
>>> I do not object here. :)
>>>
>>> There are still a few bugs to be ironed out in icu4j 3.8 before we start
>>>
>>>       
>>>> using it (unless we are willing to put up with almost all the Bidi tests
>>>> failing, which I don't think would be acceptable) so the transition to
>>>> 3.8 would not happen for a little while yet anyway. When the time came,
>>>> I would think it would make sense to move these libraries to DRLVM as
>>>> they would no longer be dependencies of classlib. Perhaps at that point
>>>> it would also be worth stepping the icu4c libraries up to 3.8 so
>>>> classlib and drlvm are at the same level?
>>>>
>>>>         
>>> I think when we are done with classlib, we file a JIRA on DRLVM build to
>>> update ICU4C to the latest version and to move it to DRLVM dependencies.
>>>
>>>       
>> Sounds like the right thing to do, thanks Pavel.
>>
>> Regards,
>> Oliver
>>
>>     
>>> Regards,
>>>
>>>       
>>>> Oliver
>>>>
>>>>         
>>> WBR,
>>>
>>>
>>>       
>> --
>> Oliver Deakin
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>>
>>     
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Ilya Berezhniuk <il...@gmail.com>.
Oliver, Pavel,

I support this idea!
I've recently investigated HARMONY-4758, and have found a bug in current ICU.
I hope some 3.4 bugs will disappear with migrating to newer ICU version.

Ilya

2007/10/1, Oliver Deakin <ol...@googlemail.com>:
>
>
> Pavel Pervov wrote:
> >> IMHO it would be good for classlib to move to pure icu4j for the reasons
> >> I've stated, plus it would mean that we could entirely remove the native
> >> code from the text module and also the BidiWrapper class, which I see as
> >> a bonus.
> >>
> >
> >
> > I do not object here. :)
> >
> > There are still a few bugs to be ironed out in icu4j 3.8 before we start
> >
> >> using it (unless we are willing to put up with almost all the Bidi tests
> >> failing, which I don't think would be acceptable) so the transition to
> >> 3.8 would not happen for a little while yet anyway. When the time came,
> >> I would think it would make sense to move these libraries to DRLVM as
> >> they would no longer be dependencies of classlib. Perhaps at that point
> >> it would also be worth stepping the icu4c libraries up to 3.8 so
> >> classlib and drlvm are at the same level?
> >>
> >
> >
> > I think when we are done with classlib, we file a JIRA on DRLVM build to
> > update ICU4C to the latest version and to move it to DRLVM dependencies.
> >
>
> Sounds like the right thing to do, thanks Pavel.
>
> Regards,
> Oliver
>
> > Regards,
> >
> >> Oliver
> >>
> >
> >
> > WBR,
> >
> >
>
> --
> Oliver Deakin
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.

Pavel Pervov wrote:
>> IMHO it would be good for classlib to move to pure icu4j for the reasons
>> I've stated, plus it would mean that we could entirely remove the native
>> code from the text module and also the BidiWrapper class, which I see as
>> a bonus.
>>     
>
>
> I do not object here. :)
>
> There are still a few bugs to be ironed out in icu4j 3.8 before we start
>   
>> using it (unless we are willing to put up with almost all the Bidi tests
>> failing, which I don't think would be acceptable) so the transition to
>> 3.8 would not happen for a little while yet anyway. When the time came,
>> I would think it would make sense to move these libraries to DRLVM as
>> they would no longer be dependencies of classlib. Perhaps at that point
>> it would also be worth stepping the icu4c libraries up to 3.8 so
>> classlib and drlvm are at the same level?
>>     
>
>
> I think when we are done with classlib, we file a JIRA on DRLVM build to
> update ICU4C to the latest version and to move it to DRLVM dependencies.
>   

Sounds like the right thing to do, thanks Pavel.

Regards,
Oliver

> Regards,
>   
>> Oliver
>>     
>
>
> WBR,
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Pavel Pervov <pm...@gmail.com>.
>
> IMHO it would be good for classlib to move to pure icu4j for the reasons
> I've stated, plus it would mean that we could entirely remove the native
> code from the text module and also the BidiWrapper class, which I see as
> a bonus.


I do not object here. :)

There are still a few bugs to be ironed out in icu4j 3.8 before we start
> using it (unless we are willing to put up with almost all the Bidi tests
> failing, which I don't think would be acceptable) so the transition to
> 3.8 would not happen for a little while yet anyway. When the time came,
> I would think it would make sense to move these libraries to DRLVM as
> they would no longer be dependencies of classlib. Perhaps at that point
> it would also be worth stepping the icu4c libraries up to 3.8 so
> classlib and drlvm are at the same level?


I think when we are done with classlib, we file a JIRA on DRLVM build to
update ICU4C to the latest version and to move it to DRLVM dependencies.

Regards,
> Oliver


WBR,

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Thanks Pavel, I didn't realise icu4c was also being used by DRLVM.

IMHO it would be good for classlib to move to pure icu4j for the reasons 
I've stated, plus it would mean that we could entirely remove the native 
code from the text module and also the BidiWrapper class, which I see as 
a bonus.

There are still a few bugs to be ironed out in icu4j 3.8 before we start 
using it (unless we are willing to put up with almost all the Bidi tests 
failing, which I don't think would be acceptable) so the transition to 
3.8 would not happen for a little while yet anyway. When the time came, 
I would think it would make sense to move these libraries to DRLVM as 
they would no longer be dependencies of classlib. Perhaps at that point 
it would also be worth stepping the icu4c libraries up to 3.8 so 
classlib and drlvm are at the same level?

Regards,
Oliver

Pavel Pervov wrote:
> Oliver,
>
> Please, note, that DRLVM uses prebuilt ICU4C from classlib for some internal
> tasks.
> Should we move ICU4C from classlib to DRLVM in repository?
>
> WBR,
>     Pavel.
> On 10/1/07, Oliver Deakin <ol...@googlemail.com> wrote:
>
>   
>> Hi all,
>>
>> I have been looking recently at what it would take for us to step up to
>> icu4j 3.8 and thought I would give everyone a heads up on what I have
>> discovered.
>>
>> The first thing is that icu4jni is no longer supported from this release
>> onwards. The icu4jni api have been incorporated into icu4j and are
>> implemented in pure Java now.
>> Secondly, the Bidi class has also been implemented fully in icu4j now,
>> so it is possible for us to also drop icu4c as a dependency and use pure
>> icu4j for this functionality.
>>
>> The major advantage I see of moving to pure icu4j 3.8 is that we no
>> longer need to maintain prebuilt binaries of the icu4c and icu4jni
>> libraries across all platforms in our repository. This simplifies the
>> process of upgrading to new versions of icu and also allows us to move
>> to new platforms with greater ease.
>>
>> I am currently testing a patch to switch over to icu 3.8 and completely
>> remove the need for icu4c/jni. I have discovered a couple of bugs in the
>> new Bidi functionality [1] which I have raised on the icu dev list and
>> are in the process of being fixed. I hope that once they are all
>> resolved we will be able to pick up a patched icu4j 3.8 jar for our use.
>>
>> Im interested to hear if anyone has any comments/objections to this?
>>
>> Regards,
>> Oliver
>>
>> [1]
>> http://bugs.icu-project.org/trac/ticket/5952
>> http://bugs.icu-project.org/trac/ticket/5961
>>
>> --
>> Oliver Deakin
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>>
>>     
>
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Pavel Pervov <pm...@gmail.com>.
Oliver,

Please, note, that DRLVM uses prebuilt ICU4C from classlib for some internal
tasks.
Should we move ICU4C from classlib to DRLVM in repository?

WBR,
    Pavel.
On 10/1/07, Oliver Deakin <ol...@googlemail.com> wrote:

> Hi all,
>
> I have been looking recently at what it would take for us to step up to
> icu4j 3.8 and thought I would give everyone a heads up on what I have
> discovered.
>
> The first thing is that icu4jni is no longer supported from this release
> onwards. The icu4jni api have been incorporated into icu4j and are
> implemented in pure Java now.
> Secondly, the Bidi class has also been implemented fully in icu4j now,
> so it is possible for us to also drop icu4c as a dependency and use pure
> icu4j for this functionality.
>
> The major advantage I see of moving to pure icu4j 3.8 is that we no
> longer need to maintain prebuilt binaries of the icu4c and icu4jni
> libraries across all platforms in our repository. This simplifies the
> process of upgrading to new versions of icu and also allows us to move
> to new platforms with greater ease.
>
> I am currently testing a patch to switch over to icu 3.8 and completely
> remove the need for icu4c/jni. I have discovered a couple of bugs in the
> new Bidi functionality [1] which I have raised on the icu dev list and
> are in the process of being fixed. I hope that once they are all
> resolved we will be able to pick up a patched icu4j 3.8 jar for our use.
>
> Im interested to hear if anyone has any comments/objections to this?
>
> Regards,
> Oliver
>
> [1]
> http://bugs.icu-project.org/trac/ticket/5952
> http://bugs.icu-project.org/trac/ticket/5961
>
> --
> Oliver Deakin
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>


-- 
Pavel Pervov,
Intel Enterprise Solutions Software Division

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Alexei Zakharov wrote:
> Hi,
>
> AFAIK internal providers implementation came from HARMONY-3593
> ([classlib][nio_char] Contribution of charset encoders/decoders for
> nio_char module). And as far as I understand these providers still
> have some benefits - they're faster than ICU's ones and fix some known
> ICU issues (see HARMONY-3307 for example). Original announcement
> message can be found at
> http://article.gmane.org/gmane.comp.java.harmony.devel/25623
>
> In this way, I think we should be careful with switching back to
> pure-ICU without detailed investigation.
>   

Thanks for the explanation Alexei. I agree, if there is a clear benefit 
(performance or otherwise) to using our own charsets over those provided 
by ICU, then we will likely want to keep the setup we currently have or 
look at ways to utilise both sets of charsets if this is appropriate 
(for example, if ICU charsets give better performance under certain 
conditions).

Regards,
Oliver

> Thanks,
> Alexei
>
> 2007/10/8, Ilya Berezhniuk <il...@gmail.com>:
>   
>> Agree, there is no problem - Charset class adds internal charsets
>> first, and when adding charsets from ICU it skips existing ones,
>> comparing by charset canonical name.
>>
>> Probably the reason is easy extensibility - external charset provider
>> can be simply substituted in configuration file, and also several
>> external providers could be specified.
>>
>>
>> 2007/10/8, Oliver Deakin <ol...@googlemail.com>:
>>     
>>> Ilya Berezhniuk wrote:
>>>       
>>>>>> BTW, We keep some resource bundle classes in luni, such as Locale and
>>>>>> Currency, which used by luni and text module. These data are aslo
>>>>>> included in icu, I suggest to remove this overlap, just keep one of
>>>>>> them.
>>>>>>
>>>>>>             
>>>>> Agreed - if we can use the ICU version of these resources then IMHO we
>>>>> should do it.
>>>>>
>>>>>           
>>>> There is similar place in java/nio/charset/Charset: when
>>>> availableCharsets method prepares charsets it concatenates ICU
>>>> charsets with a set of internal charsets returned by
>>>> org.apache.harmony.niochar.CharsetProviderImpl class. But I'm not
>>>> quite sure that it should be fixed; probably there were some reasons
>>>> for such approach.
>>>>
>>>>         
>>> It would be interesting to know the reasons for this. Where they
>>> charsets missing from ICU or were the ICU versions of those charsets
>>> lacking something? I would think that if it was possible for us to feed
>>> those issues back to ICU and have them maintain these resources for us
>>> it would be a good thing. However I don't see any problem with us
>>> keeping the Harmony versions of these resources if need be - I don't
>>> feel strongly either way.
>>>       
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Alexei Zakharov <al...@gmail.com>.
Hi,

AFAIK internal providers implementation came from HARMONY-3593
([classlib][nio_char] Contribution of charset encoders/decoders for
nio_char module). And as far as I understand these providers still
have some benefits - they're faster than ICU's ones and fix some known
ICU issues (see HARMONY-3307 for example). Original announcement
message can be found at
http://article.gmane.org/gmane.comp.java.harmony.devel/25623

In this way, I think we should be careful with switching back to
pure-ICU without detailed investigation.

Thanks,
Alexei

2007/10/8, Ilya Berezhniuk <il...@gmail.com>:
> Agree, there is no problem - Charset class adds internal charsets
> first, and when adding charsets from ICU it skips existing ones,
> comparing by charset canonical name.
>
> Probably the reason is easy extensibility - external charset provider
> can be simply substituted in configuration file, and also several
> external providers could be specified.
>
>
> 2007/10/8, Oliver Deakin <ol...@googlemail.com>:
> > Ilya Berezhniuk wrote:
> > >>> BTW, We keep some resource bundle classes in luni, such as Locale and
> > >>> Currency, which used by luni and text module. These data are aslo
> > >>> included in icu, I suggest to remove this overlap, just keep one of
> > >>> them.
> > >>>
> > >> Agreed - if we can use the ICU version of these resources then IMHO we
> > >> should do it.
> > >>
> > >
> > > There is similar place in java/nio/charset/Charset: when
> > > availableCharsets method prepares charsets it concatenates ICU
> > > charsets with a set of internal charsets returned by
> > > org.apache.harmony.niochar.CharsetProviderImpl class. But I'm not
> > > quite sure that it should be fixed; probably there were some reasons
> > > for such approach.
> > >
> >
> > It would be interesting to know the reasons for this. Where they
> > charsets missing from ICU or were the ICU versions of those charsets
> > lacking something? I would think that if it was possible for us to feed
> > those issues back to ICU and have them maintain these resources for us
> > it would be a good thing. However I don't see any problem with us
> > keeping the Harmony versions of these resources if need be - I don't
> > feel strongly either way.

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Ilya Berezhniuk <il...@gmail.com>.
Agree, there is no problem - Charset class adds internal charsets
first, and when adding charsets from ICU it skips existing ones,
comparing by charset canonical name.

Probably the reason is easy extensibility - external charset provider
can be simply substituted in configuration file, and also several
external providers could be specified.


2007/10/8, Oliver Deakin <ol...@googlemail.com>:
> Ilya Berezhniuk wrote:
> >>> BTW, We keep some resource bundle classes in luni, such as Locale and
> >>> Currency, which used by luni and text module. These data are aslo
> >>> included in icu, I suggest to remove this overlap, just keep one of
> >>> them.
> >>>
> >> Agreed - if we can use the ICU version of these resources then IMHO we
> >> should do it.
> >>
> >
> > There is similar place in java/nio/charset/Charset: when
> > availableCharsets method prepares charsets it concatenates ICU
> > charsets with a set of internal charsets returned by
> > org.apache.harmony.niochar.CharsetProviderImpl class. But I'm not
> > quite sure that it should be fixed; probably there were some reasons
> > for such approach.
> >
>
> It would be interesting to know the reasons for this. Where they
> charsets missing from ICU or were the ICU versions of those charsets
> lacking something? I would think that if it was possible for us to feed
> those issues back to ICU and have them maintain these resources for us
> it would be a good thing. However I don't see any problem with us
> keeping the Harmony versions of these resources if need be - I don't
> feel strongly either way.
>
> Regards,
> Oliver
>
> > Ilya
> >
> >
>
> --
> Oliver Deakin
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>


-- 

Ilya

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Ilya Berezhniuk wrote:
>>> BTW, We keep some resource bundle classes in luni, such as Locale and
>>> Currency, which used by luni and text module. These data are aslo
>>> included in icu, I suggest to remove this overlap, just keep one of
>>> them.
>>>       
>> Agreed - if we can use the ICU version of these resources then IMHO we
>> should do it.
>>     
>
> There is similar place in java/nio/charset/Charset: when
> availableCharsets method prepares charsets it concatenates ICU
> charsets with a set of internal charsets returned by
> org.apache.harmony.niochar.CharsetProviderImpl class. But I'm not
> quite sure that it should be fixed; probably there were some reasons
> for such approach.
>   

It would be interesting to know the reasons for this. Where they 
charsets missing from ICU or were the ICU versions of those charsets 
lacking something? I would think that if it was possible for us to feed 
those issues back to ICU and have them maintain these resources for us 
it would be a good thing. However I don't see any problem with us 
keeping the Harmony versions of these resources if need be - I don't 
feel strongly either way.

Regards,
Oliver

> Ilya
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Ilya Berezhniuk <il...@gmail.com>.
> > BTW, We keep some resource bundle classes in luni, such as Locale and
> > Currency, which used by luni and text module. These data are aslo
> > included in icu, I suggest to remove this overlap, just keep one of
> > them.
>
> Agreed - if we can use the ICU version of these resources then IMHO we
> should do it.

There is similar place in java/nio/charset/Charset: when
availableCharsets method prepares charsets it concatenates ICU
charsets with a set of internal charsets returned by
org.apache.harmony.niochar.CharsetProviderImpl class. But I'm not
quite sure that it should be fixed; probably there were some reasons
for such approach.

Ilya

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Thanks for these results Alexei - it's interesting to see that icu4j 
does not lag far behind icu4jni even on such a large conversion.

I have discovered that ICU4J 3.8 does not support ISO-2022 charsets 
currently [1], which causes one test 
(tests.api.java.io.InputStreamReaderTest.test_read()) to fail. This 
would only be a temporary issue and I do not see it as a major issue. 
However, I am not familiar with this charset and as such cannot fully 
gauge the impact of it's absence on the community. Would this lack of 
support be an issue?

If the short-term lack of ISO-2022 support is not a problem, then Id 
like to move ahead to completely use icu4j 3.8 and remove the icu4jni 
and icu4c dependencies in classlib. I will give it a couple of days and, 
if there are no objections, I will go ahead and apply the changes required.

Regards,
Oliver

[1] http://bugs.icu-project.org/trac/ticket/5791

Alexei Zakharov wrote:
> Hi Oliver,
>
> I've created a small benchmark too. It takes Leo Tolstoy's "War and
> Peace" Book One as input and converts it from Russian CP-1251 to
> UTF-16 (10 times) and back (also 10 times). You may find the
> benchmark's source code and a build file at [1].  The first difference
> from your benchmark is the language & encoding - Russian in my case.
> The second difference is the set of tested VMs - I've run the
> benchmark on RI, J9 and DLRVM.
>
> You may find results below. BTW the results shows that in this
> particular test our internal providers (from
> org.apache.harmony.niochar.charset package) are faster than both
> versions of ICU. Another interesting fact is terrible ICU performance
> on DLRVM. However, on J9 it works rather fast. And this is something
> that should be fixed IMO (bad performance on DRLVM I mean). And
> finally, yes, ICU4JNI is a little bit faster than ICU4J in this test.
> However, "War and Peace" is a rather big book (paper version of the
> first part contains about 400 pages, if repeated 10 times = 4000
> pages), but difference in numbers is not so big.
>
> [1] http://people.apache.org/~ayza/icu_experiments/
>
>
> RI
> ---
> Built-in
> <sun.nio.cs.MS1251$Decoder> Decoding time: 571 millis
> <sun.nio.cs.MS1251$Encoder> Encoding time: 351 millis
>
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 430 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 551 millis
>
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 401 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 540 millis
>
> J9
> ---
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 231 millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 430 millis
>
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 781 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 620 millis
>
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 561 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 371 millis
>
>
> DRLVM
> ---
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 351 millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 540 millis
>
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6660 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 1071 millis
>
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6179 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 451 millis
>
> With Best Regards,
> Alexei
>
> 2007/10/11, Oliver Deakin <ol...@googlemail.com>:
>   
>> Tony Wu wrote:
>>     
>>> On 10/8/07, Oliver Deakin <ol...@googlemail.com> wrote:
>>>       
>>>> Are there any particular
>>>> benchmarks you had in mind for this?
>>>>
>>>>
>>>>         
>>> ya, there is a micro benchmark on HARMONY-3709
>>>
>>>
>>>       
>> <SNIP!>
>>
>> I have run the micro benchmark on Harmony with it's current ICU
>> configuration (icu4jni 3.4.4) and on Harmony with pure icu4j 3.8. The
>> results are pretty much as expected - for small jobs icu4j is
>> significantly faster, for large jobs icu4jni comes out on top (full
>> results at the end of this email). It seems that performance-wise there
>> are benefits on both sides depending on the work we are doing.
>>
>> Regards,
>> Oliver
>>     
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tim Ellison <t....@gmail.com>.
Vladimir Strigun wrote:
> Sorry, it seems my results more confusing for you :) I hope this looks better:
> 
> Decoding time(j): 2844 millis for  8bytes  (3000000 iterations)
> Decoding time(j): 1797 millis for  128bytes  (800000 iterations)
> Decoding time(j): 1578 millis for   8k       (20000 iterations)
> Decoding time(j): 1203 millis for  16k      (8000 iterations)
> Decoding time(j): 1125 millis for  128K   (1000 iterations)
> 
> Decoding time(d): 5328 millis for  8bytes (3000000 iterations)
> Decoding time(d): 1906 millis for 128bytes (800000 iterations)
> Decoding time(d):  797 millis for  8k     (20000 iterations)
> Decoding time(d):  609 millis for 16k    (8000 iterations)
> Decoding time(d):  672 millis for 128k  (1000 iterations)

ah, that's an important piece of information :-)

Thanks
Tim

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Vladimir Strigun <vs...@gmail.com>.
On 10/18/07, Tim Ellison <t....@gmail.com> wrote:
> Vladimir Strigun wrote:
> > I'm a little bit confused about the results.
>
> and I'm confused by your numbers below<g> so we match
>
> > I tried to run the test (slughtly modified) on DRLVM and got the next results:
>
> Let me just chop out the class name to make it more readable...
>
> > Decoding time(j): 2844 millis
> > Decoding time(j): 1797 millis
> > Decoding time(j): 1578 millis
> > Decoding time(j): 1203 millis
> > Decoding time(j): 1125 millis
> > Decoding time(d): 5328 millis
> > Decoding time(d): 1906 millis
> > Decoding time(d): 797 millis
> > Decoding time(d): 609 millis
> > Decoding time(d): 672 millis
> >
> > Input length were 8 bytes, 128, 8kb, 16k and 128k correspondingly.
>
> Though presumably in the opposite order, i.e.
>
> Decoding time(j): 2844 millis for 128k =  22 ms/kb
> Decoding time(j): 1797 millis for  16k = 112 ms/kb
> Decoding time(j): 1578 millis for   8k = 197 ms/kb
> Decoding time(j): 1203 millis
> Decoding time(j): 1125 millis really took >1sec for 8 bytes?! no...
>
> Decoding time(d): 5328 millis for 128k =  42 ms/kb
> Decoding time(d): 1906 millis for  16k = 119 ms/kb
> Decoding time(d):  797 millis for 8k = 100 ms/kb
> Decoding time(d):  609 millis
> Decoding time(d):  672 millis

Sorry, it seems my results more confusing for you :) I hope this looks better:

Decoding time(j): 2844 millis for  8bytes  (3000000 iterations)
Decoding time(j): 1797 millis for  128bytes  (800000 iterations)
Decoding time(j): 1578 millis for   8k       (20000 iterations)
Decoding time(j): 1203 millis for  16k      (8000 iterations)
Decoding time(j): 1125 millis for  128K   (1000 iterations)

Decoding time(d): 5328 millis for  8bytes (3000000 iterations)
Decoding time(d): 1906 millis for 128bytes (800000 iterations)
Decoding time(d):  797 millis for  8k     (20000 iterations)
Decoding time(d):  609 millis for 16k    (8000 iterations)
Decoding time(d):  672 millis for 128k  (1000 iterations)

Thanks.
Vladimir.

>
> > (j) means java buffer, (d) - direct byte buffer. It seems that native
> > decoding faster for big buffers, whereas Java part works fine for
> > small input.
>
> Non of the (j) numbers look good for small input, clearly I have
> misunderstood.
>
> Regards,
> Tim
>

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tim Ellison <t....@gmail.com>.
Vladimir Strigun wrote:
> I'm a little bit confused about the results.

and I'm confused by your numbers below<g> so we match

> I tried to run the test (slughtly modified) on DRLVM and got the next results:

Let me just chop out the class name to make it more readable...

> Decoding time(j): 2844 millis
> Decoding time(j): 1797 millis
> Decoding time(j): 1578 millis
> Decoding time(j): 1203 millis
> Decoding time(j): 1125 millis
> Decoding time(d): 5328 millis
> Decoding time(d): 1906 millis
> Decoding time(d): 797 millis
> Decoding time(d): 609 millis
> Decoding time(d): 672 millis
> 
> Input length were 8 bytes, 128, 8kb, 16k and 128k correspondingly.

Though presumably in the opposite order, i.e.

Decoding time(j): 2844 millis for 128k =  22 ms/kb
Decoding time(j): 1797 millis for  16k = 112 ms/kb
Decoding time(j): 1578 millis for   8k = 197 ms/kb
Decoding time(j): 1203 millis
Decoding time(j): 1125 millis really took >1sec for 8 bytes?! no...

Decoding time(d): 5328 millis for 128k =  42 ms/kb
Decoding time(d): 1906 millis for  16k = 119 ms/kb
Decoding time(d):  797 millis for   8k = 100 ms/kb
Decoding time(d):  609 millis
Decoding time(d):  672 millis


> (j) means java buffer, (d) - direct byte buffer. It seems that native
> decoding faster for big buffers, whereas Java part works fine for 
> small input.

Non of the (j) numbers look good for small input, clearly I have
misunderstood.

Regards,
Tim

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Vladimir Strigun <vs...@gmail.com>.
On 10/17/07, Tim Ellison <t....@gmail.com> wrote:
> Tim Ellison wrote:
> > (Left as an exercise for the reader <g>)
>
> Feeling a bit guilty about the cop-out...
>
> I slightly modified Alexei's test case to
>  - include some warm-up encode/decode loops to get the methods
>   jitted to a reasonable level,
>  - read the data into a direct byte buffer, and then into a
>   'regular' byte buffer, e.g. one allocated in the Java heap,
>  - I then looked at the effect of converting a short string (129 chars)
>
> I was running this on the IBM VME, and here's what I got (below).
>
> Interestingly the Java decoder was faster on the long string than the
> native code.  The others are sufficiently similar to imply to me that we
> should just keep it all in Java.
>
>
> === long string
>
> Read chars = 3285165
> 10 loops warm up
> 10 loops timed
>
> Direct ByteBuffer
>
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 781
> millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 571
> millis
>
>
> Java Heap Byte Buffer
>
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 430
> millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 521
> millis
>
>
> === short string
>
> Read chars = 129
> 1000 loops warm-up
> 10000 loops timed
>
>
> Direct ByteBuffer
>
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 10
> millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 0 millis
>
>
> Java Heap Byte Buffer
>
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 10
> millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 0 millis
>

I'm a little bit confused about the results.
I tried to run the test (slughtly modified) on DRLVM and got the next results:
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(j):
2844 millis
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(j):
1797 millis
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(j):
1578 millis
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(j):
1203 millis
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(j):
1125 millis
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(d):
5328 millis
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(d):
1906 millis
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(d):
797 millis
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(d):
609 millis
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time(d):
672 millis

Input length were 8 bytes, 128, 8kb, 16k and 128k correspondingly. (j)
means java buffer, (d) - direct byte buffer. It seems that native
decoding faster for big buffers, whereas Java part works fine for
small input.

Thanks.
Vladimir.

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tim Ellison <t....@gmail.com>.
Mark Hindess wrote:
> On 17 October 2007 at 16:32, Tim Ellison <t....@gmail.com> wrote:
>> Tim Ellison wrote:
>> I was running this on the IBM VME, and here's what I got (below).
>>
>> Interestingly the Java decoder was faster on the long string than the
>> native code.  The others are sufficiently similar to imply to me that
>> we should just keep it all in Java.
> 
> You mean remove the heuristic and remove the intel-contributed native
> code?  I guess that seems reasonable given these results; it would
> enable us to reduce the size of the code base (and jre footprint
> as discussed elsewhere) and concentrate our efforts on the java
> implementation.

Well I'm not quite there yet.  I was running on the IBM VME and only did
a modicum of testing on my uniprocessor laptop, so I would expect to get
a more compelling case before discarding any existing code.

> Of course, this is rather dependent on us being able to achieve similar
> results on DRLVM - so it would be interesting to see these results for
> that VM too.

Agreed, and on different OS / CPU combinations.

It may be that I am measuring the capabilities of the JIT, which
certainly makes a big difference:

(WinXP, Centrino, large string, w/ warm up cycles, IBM VME)

Jit off, non-direct buffer:

  Decoding time: 2193 millis
  Encoding time: 2634 millis

Jit off, direct buffer:

  Decoding time: 771 millis
  Encoding time: 2624 millis    <-- looks strange


Jit on, direct buffer:
  Decoding time: 751 millis
  Encoding time: 461 millis

Jit on, non-direct bufer:
  Decoding time: 420 millis
  Encoding time: 481 millis


Regards,
Tim

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Mark Hindess <ma...@googlemail.com>.
On 17 October 2007 at 16:32, Tim Ellison <t....@gmail.com> wrote:
> Tim Ellison wrote:
> > (Left as an exercise for the reader <g>)
> 
> Feeling a bit guilty about the cop-out...

And so you should ;-)

> I slightly modified Alexei's test case to
>  - include some warm-up encode/decode loops to get the methods
>    jitted to a reasonable level,
>  - read the data into a direct byte buffer, and then into a
>    'regular' byte buffer, e.g. one allocated in the Java heap,
>  - I then looked at the effect of converting a short string (129 chars)

I'd have gone with the following 110 character string myself:

  "I took a speed-reading course and read War and Peace in twenty
  minutes.  It involves Russia." - Woody Allen.

> I was running this on the IBM VME, and here's what I got (below).
>
> Interestingly the Java decoder was faster on the long string than the
> native code.  The others are sufficiently similar to imply to me that
> we should just keep it all in Java.

You mean remove the heuristic and remove the intel-contributed native
code?  I guess that seems reasonable given these results; it would
enable us to reduce the size of the code base (and jre footprint
as discussed elsewhere) and concentrate our efforts on the java
implementation.

Of course, this is rather dependent on us being able to achieve similar
results on DRLVM - so it would be interesting to see these results for
that VM too.

Regards,
 Mark.

> === long string
> 
> Read chars = 3285165
> 10 loops warm up
> 10 loops timed
> 
> Direct ByteBuffer
> 
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 781
> millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 571
> millis
> 
> 
> Java Heap Byte Buffer
> 
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 430
> millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 521
> millis
> 
> 
> === short string
> 
> Read chars = 129
> 1000 loops warm-up
> 10000 loops timed
> 
> 
> Direct ByteBuffer
> 
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 10
> millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 0 millis
> 
> 
> Java Heap Byte Buffer
> 
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 10
> millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 0 millis
> 



Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tim Ellison <t....@gmail.com>.
Tim Ellison wrote:
> (Left as an exercise for the reader <g>)

Feeling a bit guilty about the cop-out...

I slightly modified Alexei's test case to
 - include some warm-up encode/decode loops to get the methods
   jitted to a reasonable level,
 - read the data into a direct byte buffer, and then into a
   'regular' byte buffer, e.g. one allocated in the Java heap,
 - I then looked at the effect of converting a short string (129 chars)

I was running this on the IBM VME, and here's what I got (below).

Interestingly the Java decoder was faster on the long string than the
native code.  The others are sufficiently similar to imply to me that we
should just keep it all in Java.


=== long string

Read chars = 3285165
10 loops warm up
10 loops timed

Direct ByteBuffer

Built-in
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 781
millis
<org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 571
millis


Java Heap Byte Buffer

Built-in
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 430
millis
<org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 521
millis


=== short string

Read chars = 129
1000 loops warm-up
10000 loops timed


Direct ByteBuffer

Built-in
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 10
millis
<org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 0 millis


Java Heap Byte Buffer

Built-in
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 10
millis
<org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 0 millis


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tim Ellison <t....@gmail.com>.
Alexei Zakharov wrote:
> I've created a small benchmark too. It takes Leo Tolstoy's "War and
> Peace" Book One as input and converts it from Russian CP-1251 to
> UTF-16 (10 times) and back (also 10 times). You may find the
> benchmark's source code and a build file at [1].  The first difference
> from your benchmark is the language & encoding - Russian in my case.
> The second difference is the set of tested VMs - I've run the
> benchmark on RI, J9 and DLRVM.

Interesting numbers.  How common is converting that size of data do you
think in real world applications?  I'm only guessing, but I would think
that most conversions are short strings, with perhaps the occasional
long XML document.  While the converters should not go pathological on
such a long string I am concerned that we optimize for the right case.

> You may find results below. BTW the results shows that in this
> particular test our internal providers (from
> org.apache.harmony.niochar.charset package) are faster than both
> versions of ICU. Another interesting fact is terrible ICU performance
> on DLRVM. However, on J9 it works rather fast. And this is something
> that should be fixed IMO (bad performance on DRLVM I mean). And
> finally, yes, ICU4JNI is a little bit faster than ICU4J in this test.
> However, "War and Peace" is a rather big book (paper version of the
> first part contains about 400 pages, if repeated 10 times = 4000
> pages), but difference in numbers is not so big.
> 
> [1] http://people.apache.org/~ayza/icu_experiments/

I had a quick look at the benchmark you were using, and have a couple of
observations:

I see that it uses a MappedByteBuffer (which is a type if direct
ByteBuffer), so that will exercise the *native* code encoding/decoding
loop in Harmony.  I wonder how things change if you use a Java-heap
ByteBuffer so it uses the *Java* code encoding/decoding loop.

It would also be interesting to vary the inputs across a number of
string lengths to see if there is a reasonable heuristic we should add
to avoid incurring the JNI overhead even when the buffer is direct.

So the numbers you show are useful, but it would be even more useful to
see graphs of time vs. input size for direct and non-direct byte buffers
too.

(Left as an exercise for the reader <g>)

Regards,
Tim

> RI
> ---
> Built-in
> <sun.nio.cs.MS1251$Decoder> Decoding time: 571 millis
> <sun.nio.cs.MS1251$Encoder> Encoding time: 351 millis
> 
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 430 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 551 millis
> 
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 401 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 540 millis
> 
> J9
> ---
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 231 millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 430 millis
> 
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 781 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 620 millis
> 
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 561 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 371 millis
> 
> 
> DRLVM
> ---
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 351 millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 540 millis
> 
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6660 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 1071 millis
> 
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6179 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 451 millis
> 
> With Best Regards,
> Alexei
> 
> 2007/10/11, Oliver Deakin <ol...@googlemail.com>:
>> Tony Wu wrote:
>>> On 10/8/07, Oliver Deakin <ol...@googlemail.com> wrote:
>>>> Are there any particular
>>>> benchmarks you had in mind for this?
>>>>
>>>>
>>> ya, there is a micro benchmark on HARMONY-3709
>>>
>>>
>> <SNIP!>
>>
>> I have run the micro benchmark on Harmony with it's current ICU
>> configuration (icu4jni 3.4.4) and on Harmony with pure icu4j 3.8. The
>> results are pretty much as expected - for small jobs icu4j is
>> significantly faster, for large jobs icu4jni comes out on top (full
>> results at the end of this email). It seems that performance-wise there
>> are benefits on both sides depending on the work we are doing.
>>
>> Regards,
>> Oliver
> 

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Alexei Zakharov <al...@gmail.com>.
Hi Oliver,

I've created a small benchmark too. It takes Leo Tolstoy's "War and
Peace" Book One as input and converts it from Russian CP-1251 to
UTF-16 (10 times) and back (also 10 times). You may find the
benchmark's source code and a build file at [1].  The first difference
from your benchmark is the language & encoding - Russian in my case.
The second difference is the set of tested VMs - I've run the
benchmark on RI, J9 and DLRVM.

You may find results below. BTW the results shows that in this
particular test our internal providers (from
org.apache.harmony.niochar.charset package) are faster than both
versions of ICU. Another interesting fact is terrible ICU performance
on DLRVM. However, on J9 it works rather fast. And this is something
that should be fixed IMO (bad performance on DRLVM I mean). And
finally, yes, ICU4JNI is a little bit faster than ICU4J in this test.
However, "War and Peace" is a rather big book (paper version of the
first part contains about 400 pages, if repeated 10 times = 4000
pages), but difference in numbers is not so big.

[1] http://people.apache.org/~ayza/icu_experiments/


RI
---
Built-in
<sun.nio.cs.MS1251$Decoder> Decoding time: 571 millis
<sun.nio.cs.MS1251$Encoder> Encoding time: 351 millis

ICU4j
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 430 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 551 millis

ICU4JNI
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 401 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 540 millis

J9
---
Built-in
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 231 millis
<org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 430 millis

ICU4j
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 781 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 620 millis

ICU4JNI
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 561 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 371 millis


DRLVM
---
Built-in
<org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 351 millis
<org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 540 millis

ICU4j
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6660 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 1071 millis

ICU4JNI
<com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6179 millis
<com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 451 millis

With Best Regards,
Alexei

2007/10/11, Oliver Deakin <ol...@googlemail.com>:
> Tony Wu wrote:
> > On 10/8/07, Oliver Deakin <ol...@googlemail.com> wrote:
> >> Are there any particular
> >> benchmarks you had in mind for this?
> >>
> >>
> > ya, there is a micro benchmark on HARMONY-3709
> >
> >
> <SNIP!>
>
> I have run the micro benchmark on Harmony with it's current ICU
> configuration (icu4jni 3.4.4) and on Harmony with pure icu4j 3.8. The
> results are pretty much as expected - for small jobs icu4j is
> significantly faster, for large jobs icu4jni comes out on top (full
> results at the end of this email). It seems that performance-wise there
> are benefits on both sides depending on the work we are doing.
>
> Regards,
> Oliver

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Tony Wu wrote:
> On 10/8/07, Oliver Deakin <ol...@googlemail.com> wrote:
>> Are there any particular
>> benchmarks you had in mind for this?
>>
>>     
> ya, there is a micro benchmark on HARMONY-3709
>
>   
<SNIP!>

I have run the micro benchmark on Harmony with it's current ICU 
configuration (icu4jni 3.4.4) and on Harmony with pure icu4j 3.8. The 
results are pretty much as expected - for small jobs icu4j is 
significantly faster, for large jobs icu4jni comes out on top (full 
results at the end of this email). It seems that performance-wise there 
are benefits on both sides depending on the work we are doing.

Regards,
Oliver


Encoding:

Small Input:
Encoding: GB18030 , 1000000 times
J Milliseconds: 1015.0
JNI Milliseconds: 1000.0
J/JNI Percentage: 101.5
JNI/J Percentage: 98.5

Small Input:
Encoding: ISO-8859-1 , 1000000 times
J Milliseconds: 328.0
JNI Milliseconds: 703.0
J/JNI Percentage: 46.7
JNI/J Percentage: 214.3

Small Input:
Encoding: UTF-8 , 1000000 times
J Milliseconds: 343.0
JNI Milliseconds: 594.0
J/JNI Percentage: 57.7
JNI/J Percentage: 173.2

Large Input:
Encoding: GB18030 , 1000 times
J Milliseconds: 7312.0
JNI Milliseconds: 2984.0
J/JNI Percentage: 245.0
JNI/J Percentage: 40.8

Large Input:
Encoding: ISO-8859-1 , 1000 times
J Milliseconds: 188.0
JNI Milliseconds: 110.0
J/JNI Percentage: 170.9
JNI/J Percentage: 58.5

Large Input:
Encoding: UTF-8 , 1000 times
J Milliseconds: 594.0
JNI Milliseconds: 359.0
J/JNI Percentage: 165.5
JNI/J Percentage: 60.4


Decoding:

Small Input:
Decoding: GB18030 , 1000000 times
J Milliseconds: 625.0
JNI Milliseconds: 766.0
J/JNI Percentage: 81.6
JNI/J Percentage: 122.6

Small Input:
Decoding: ISO-8859-1 , 1000000 times
J Milliseconds: 328.0
JNI Milliseconds: 781.0
J/JNI Percentage: 41.9
JNI/J Percentage: 238.1

Small Input:
Decoding: UTF-8 , 1000000 times
J Milliseconds: 360.0
JNI Milliseconds: 688.0
J/JNI Percentage: 52.3
JNI/J Percentage: 191.1

Large Input:
Decoding: GB18030 , 1000 times
J Milliseconds: 1969.0
JNI Milliseconds: 1719.0
J/JNI Percentage: 114.5
JNI/J Percentage: 87.3

Large Input:
Decoding: ISO-8859-1 , 1000 times
J Milliseconds: 140.0
JNI Milliseconds: 78.0
J/JNI Percentage: 179.5
JNI/J Percentage: 55.7

Large Input:
Decoding: UTF-8 , 1000 times
J Milliseconds: 719.0
JNI Milliseconds: 187.0
J/JNI Percentage: 384.4
JNI/J Percentage: 26.0

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Tim Ellison wrote:
> Tony Wu wrote:
>   
>> On 10/9/07, Oliver Deakin <ol...@googlemail.com> wrote:
>>     
> <snip>
>   
>>> J9's dependency on the ICUInterface dll is purely a dllload() call so
>>> that the library is initialised before use by the class library. I
>>> believe this can be easily worked around with the current J9 VME by
>>> simply having a dummy dll in jre/bin for it to load.
>>>
>>>       
>> would you pls help to build a new J9 VME for this purpose :)
>>     
>
> I think Oliver meant that creating a dummy ICUInterface.dll would allow
> the current VME to work without modification.
>   

Yes, sorry if I wasn't clear enough - that's exactly what I meant.

> <snip>
>   
>>> Which data in luni have you moved to the ICU versions? There is some
>>> conversation in another branch of this thread [3] about keeping Harmony
>>> versions of charsets - is that relevant to what you're looking at?
>>>
>>>       
>> Sorry I should have clarified that it's not about charsets. I
>> delegated all the classes which depends on their internal resoure
>> bundle data in luni to ICU's impl, so that we can remove all the
>> classes in removed the package org/apache/harmony/luni/internal/locale
>> later. That will get 2 mega bytes decrease form harmony source code.
>>     
>
> Wow - now that is worth having, let me know if I can help.
>
> Does ICU4J have a means of updating the data (without moving up to a new
> release)?  IIRC ICU4C has an associated tool to update timezone and
> locale info when, for example, the Olson timezone data is updated for
> new daylight savings.  I realize I'm probably asking on the wrong list,
> just wondered if you knew -- I couldn't see it obviously on the website.
>   

According to this page [1] there is a Data Customizer tool (left hand 
menu bar) which I imagine will do what you're asking for. Unfortunately 
I couldn't check because I get a 503 error when I click the link! 
However, I googled the page and took a look at the cached version (I've 
been to this page recently so it must just be temporarily down) and 
there is an option to generate resources for ICU4J as well as ICU4C. The 
tool is also mentioned on the 3.8 Release page [2]. It sounds like this 
is the new tool for ICU4J/C so I hope it would have the same 
functionality as the previous tool.

Regards,
Oliver

[1] http://demo.icu-project.org/icu-bin/icudemos
[2] http://icu-project.org/download/3.8.html

> Regards,
> Tim
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tony Wu <wu...@gmail.com>.
On 10/11/07, Tim Ellison <t....@gmail.com> wrote:
>
>
> Tony Wu wrote:
> > On 10/9/07, Oliver Deakin <ol...@googlemail.com> wrote:
> <snip>
> >> J9's dependency on the ICUInterface dll is purely a dllload() call so
> >> that the library is initialised before use by the class library. I
> >> believe this can be easily worked around with the current J9 VME by
> >> simply having a dummy dll in jre/bin for it to load.
> >>
> > would you pls help to build a new J9 VME for this purpose :)
>
> I think Oliver meant that creating a dummy ICUInterface.dll would allow
> the current VME to work without modification.
>

Oh yes, I got it.
> <snip>
> >> Which data in luni have you moved to the ICU versions? There is some
> >> conversation in another branch of this thread [3] about keeping Harmony
> >> versions of charsets - is that relevant to what you're looking at?
> >>
> > Sorry I should have clarified that it's not about charsets. I
> > delegated all the classes which depends on their internal resoure
> > bundle data in luni to ICU's impl, so that we can remove all the
> > classes in removed the package org/apache/harmony/luni/internal/locale
> > later. That will get 2 mega bytes decrease form harmony source code.
>
> Wow - now that is worth having, let me know if I can help.
>
> Does ICU4J have a means of updating the data (without moving up to a new
> release)?  IIRC ICU4C has an associated tool to update timezone and
> locale info when, for example, the Olson timezone data is updated for
> new daylight savings.  I realize I'm probably asking on the wrong list,
> just wondered if you knew -- I couldn't see it obviously on the website.
>
Yes, we can modify the data files then rebuild it with icu's tools.
I've tried to customize the data for icu4j3.4 before and it works.

> Regards,
> Tim
>


-- 
Tony Wu
China Software Development Lab, IBM

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tim Ellison <t....@gmail.com>.

Tony Wu wrote:
> On 10/9/07, Oliver Deakin <ol...@googlemail.com> wrote:
<snip>
>> J9's dependency on the ICUInterface dll is purely a dllload() call so
>> that the library is initialised before use by the class library. I
>> believe this can be easily worked around with the current J9 VME by
>> simply having a dummy dll in jre/bin for it to load.
>>
> would you pls help to build a new J9 VME for this purpose :)

I think Oliver meant that creating a dummy ICUInterface.dll would allow
the current VME to work without modification.

<snip>
>> Which data in luni have you moved to the ICU versions? There is some
>> conversation in another branch of this thread [3] about keeping Harmony
>> versions of charsets - is that relevant to what you're looking at?
>>
> Sorry I should have clarified that it's not about charsets. I
> delegated all the classes which depends on their internal resoure
> bundle data in luni to ICU's impl, so that we can remove all the
> classes in removed the package org/apache/harmony/luni/internal/locale
> later. That will get 2 mega bytes decrease form harmony source code.

Wow - now that is worth having, let me know if I can help.

Does ICU4J have a means of updating the data (without moving up to a new
release)?  IIRC ICU4C has an associated tool to update timezone and
locale info when, for example, the Olson timezone data is updated for
new daylight savings.  I realize I'm probably asking on the wrong list,
just wondered if you knew -- I couldn't see it obviously on the website.

Regards,
Tim

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tony Wu <wu...@gmail.com>.
On 10/9/07, Oliver Deakin <ol...@googlemail.com> wrote:
> Tony Wu wrote:
> > On 10/8/07, Oliver Deakin <ol...@googlemail.com> wrote:
> >
> >> Tony Wu wrote:
> >>
> >>> Sorry for reply late, just recover from national holiday.
> >>>
> >>>
> >> No problem :)
> >>
> >>
> >>> About the performance issue, native code is faster but jni call is
> >>> heavy. So, icu4j 3.4 is good at encoding/decoding several bytes
> >>> whereas icu4jni3.4 is good at handling thousands bytes. Bidi is
> >>> another story, I'll try to compare the native impl and java impl later
> >>> in 3.8.
> >>>
> >> Thanks Tony - it will be interesting to know how pure Java ICU performs
> >> within Harmony compared to the current setup. If it helps I can open a
> >> JIRA for moving to ICU4J 3.8 and attach a basic patch to eliminate the
> >> need for icu4jni/icu4c to get you started.
> >>
> > ok. we may need change the code in both DRLVM and j9 since the lib of
> > icu4c is directly refered by them.
> >
>
> Yes - In [1] Pavel mentioned that DRLVM uses ICU4C for some tasks. I
> think, as Pavel later suggests [2], we could move the ICU4C libraries to
> the DRLVM repository when they are no longer required by classlib and
> raise a JIRA for further discussion as to whether we want to remove this
> dependency for DRLVM also or keep ICU4C and upgrade to 3.8.
>
agree.

> J9's dependency on the ICUInterface dll is purely a dllload() call so
> that the library is initialised before use by the class library. I
> believe this can be easily worked around with the current J9 VME by
> simply having a dummy dll in jre/bin for it to load.
>
would you pls help to build a new J9 VME for this purpose :)
> >
> >> Are there any particular
> >> benchmarks you had in mind for this?
> >>
> >>
> > ya, there is a micro benchmark on HARMONY-3709
> >
>
> Thanks Tony - Ill take a look.
>
> >
> >>> Anyway, it is worth to do some work to remove the 10m
> >>> dependency(icu4c).
> >>>
> >>> BTW, We keep some resource bundle classes in luni, such as Locale and
> >>> Currency, which used by luni and text module. These data are aslo
> >>> included in icu, I suggest to remove this overlap, just keep one of
> >>> them.
> >>>
> >> Agreed - if we can use the ICU version of these resources then IMHO we
> >> should do it.
> >>
> >>
> > I've successfully changed the data in luni but got some problem in
> > text, since the organization in resource bundle for locale is
> > different from each other. And unfortunately there is no doc in
> > current harmony impl, I need some time to try and guess them.
> >
>
> Which data in luni have you moved to the ICU versions? There is some
> conversation in another branch of this thread [3] about keeping Harmony
> versions of charsets - is that relevant to what you're looking at?
>
Sorry I should have clarified that it's not about charsets. I
delegated all the classes which depends on their internal resoure
bundle data in luni to ICU's impl, so that we can remove all the
classes in removed the package org/apache/harmony/luni/internal/locale
later. That will get 2 mega bytes decrease form harmony source code.

> Regards,
> Oliver
>
> [1]
> http://mail-archives.apache.org/mod_mbox/harmony-dev/200710.mbox/%3ce0f125db0710010352r251b0f1dt2080e8d41b849bcc@mail.gmail.com%3e
> [2]
> http://mail-archives.apache.org/mod_mbox/harmony-dev/200710.mbox/%3ce0f125db0710010818o4af7073fibe0f6a1222de6dff@mail.gmail.com%3e
> [3]
> http://mail-archives.apache.org/mod_mbox/harmony-dev/200710.mbox/%3c2c9597b90710090458v44aa7c07q48e5bee3898a582c@mail.gmail.com%3e
>
> >
> >> Regards,
> >> Oliver
> >>
> >>
> >>> A good news is that icu3.8 providers a data customization
> >>> tool[1], I've tried it and reported a failure when customizing icu4j.
> >>>
> >>> [1]http://apps.icu-project.org/datacustom/
> >>>
> >>>
> >>> On 10/2/07, Oliver Deakin <ol...@googlemail.com> wrote:
> >>>
> >>>
> >>>> Hi Alexei,
> >>>>
> >>>> Yes, performance is an issue here. I would envisage that for
> >>>> small/medium icu jobs we will see a performance increase due to calls
> >>>> into icu C code via jni adding an overhead. For larger conversion jobs
> >>>> we may find that icu4c/icu4jni provide better results. Clearly this is
> >>>> something that needs to be weighed up against the pros of moving to
> >>>> using a purely Java implementation. Are there any tests you might
> >>>> suggest to make a performance comparison?
> >>>>
> >>>> I see Tony has spoken to the icu developers before about this issue [1].
> >>>> Do you have any input Tony?
> >>>>
> >>>> Regards,
> >>>> Oliver
> >>>>
> >>>> [1] http://www.nabble.com/From-icu4jni-to-icu4j-t3543140.html
> >>>>
> >>>> Alexei Zakharov wrote:
> >>>>
> >>>>
> >>>>> Hi Oliver and all,
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> The first thing is that icu4jni is no longer supported from this release
> >>>>>> onwards. The icu4jni api have been incorporated into icu4j and are
> >>>>>> implemented in pure Java now.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Secondly, the Bidi class has also been implemented fully in icu4j now,
> >>>>>> so it is possible for us to also drop icu4c as a dependency and use pure
> >>>>>> icu4j for this functionality.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> I have no objections. Only like to be sure we won't get significant
> >>>>> performance degradation by moving from native implementation to pure
> >>>>> Java.
> >>>>>
> >>>>> Thanks,
> >>>>> Alexei
> >>>>>
> >>>>> 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I have been looking recently at what it would take for us to step up to
> >>>>>> icu4j 3.8 and thought I would give everyone a heads up on what I have
> >>>>>> discovered.
> >>>>>>
> >>>>>> The first thing is that icu4jni is no longer supported from this release
> >>>>>> onwards. The icu4jni api have been incorporated into icu4j and are
> >>>>>> implemented in pure Java now.
> >>>>>> Secondly, the Bidi class has also been implemented fully in icu4j now,
> >>>>>> so it is possible for us to also drop icu4c as a dependency and use pure
> >>>>>> icu4j for this functionality.
> >>>>>>
> >>>>>> The major advantage I see of moving to pure icu4j 3.8 is that we no
> >>>>>> longer need to maintain prebuilt binaries of the icu4c and icu4jni
> >>>>>> libraries across all platforms in our repository. This simplifies the
> >>>>>> process of upgrading to new versions of icu and also allows us to move
> >>>>>> to new platforms with greater ease.
> >>>>>>
> >>>>>> I am currently testing a patch to switch over to icu 3.8 and completely
> >>>>>> remove the need for icu4c/jni. I have discovered a couple of bugs in the
> >>>>>> new Bidi functionality [1] which I have raised on the icu dev list and
> >>>>>> are in the process of being fixed. I hope that once they are all
> >>>>>> resolved we will be able to pick up a patched icu4j 3.8 jar for our use.
> >>>>>>
> >>>>>> Im interested to hear if anyone has any comments/objections to this?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Oliver
> >>>>>>
> >>>>>> [1]
> >>>>>> http://bugs.icu-project.org/trac/ticket/5952
> >>>>>> http://bugs.icu-project.org/trac/ticket/5961
> >>>>>>
> >>>>>>
> >>>>>>
> >>>> --
> >>>> Oliver Deakin
> >>>> Unless stated otherwise above:
> >>>> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> >>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >> --
> >> Oliver Deakin
> >> Unless stated otherwise above:
> >> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> >>
> >>
> >>
> >
> >
> >
>
> --
> Oliver Deakin
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>


-- 
Tony Wu
China Software Development Lab, IBM

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Tony Wu wrote:
> On 10/8/07, Oliver Deakin <ol...@googlemail.com> wrote:
>   
>> Tony Wu wrote:
>>     
>>> Sorry for reply late, just recover from national holiday.
>>>
>>>       
>> No problem :)
>>
>>     
>>> About the performance issue, native code is faster but jni call is
>>> heavy. So, icu4j 3.4 is good at encoding/decoding several bytes
>>> whereas icu4jni3.4 is good at handling thousands bytes. Bidi is
>>> another story, I'll try to compare the native impl and java impl later
>>> in 3.8.
>>>       
>> Thanks Tony - it will be interesting to know how pure Java ICU performs
>> within Harmony compared to the current setup. If it helps I can open a
>> JIRA for moving to ICU4J 3.8 and attach a basic patch to eliminate the
>> need for icu4jni/icu4c to get you started.
>>     
> ok. we may need change the code in both DRLVM and j9 since the lib of
> icu4c is directly refered by them.
>   

Yes - In [1] Pavel mentioned that DRLVM uses ICU4C for some tasks. I 
think, as Pavel later suggests [2], we could move the ICU4C libraries to 
the DRLVM repository when they are no longer required by classlib and 
raise a JIRA for further discussion as to whether we want to remove this 
dependency for DRLVM also or keep ICU4C and upgrade to 3.8.

J9's dependency on the ICUInterface dll is purely a dllload() call so 
that the library is initialised before use by the class library. I 
believe this can be easily worked around with the current J9 VME by 
simply having a dummy dll in jre/bin for it to load.

>   
>> Are there any particular
>> benchmarks you had in mind for this?
>>
>>     
> ya, there is a micro benchmark on HARMONY-3709
>   

Thanks Tony - Ill take a look.

>   
>>> Anyway, it is worth to do some work to remove the 10m
>>> dependency(icu4c).
>>>
>>> BTW, We keep some resource bundle classes in luni, such as Locale and
>>> Currency, which used by luni and text module. These data are aslo
>>> included in icu, I suggest to remove this overlap, just keep one of
>>> them.
>>>       
>> Agreed - if we can use the ICU version of these resources then IMHO we
>> should do it.
>>
>>     
> I've successfully changed the data in luni but got some problem in
> text, since the organization in resource bundle for locale is
> different from each other. And unfortunately there is no doc in
> current harmony impl, I need some time to try and guess them.
>   

Which data in luni have you moved to the ICU versions? There is some 
conversation in another branch of this thread [3] about keeping Harmony 
versions of charsets - is that relevant to what you're looking at?

Regards,
Oliver

[1] 
http://mail-archives.apache.org/mod_mbox/harmony-dev/200710.mbox/%3ce0f125db0710010352r251b0f1dt2080e8d41b849bcc@mail.gmail.com%3e
[2] 
http://mail-archives.apache.org/mod_mbox/harmony-dev/200710.mbox/%3ce0f125db0710010818o4af7073fibe0f6a1222de6dff@mail.gmail.com%3e
[3] 
http://mail-archives.apache.org/mod_mbox/harmony-dev/200710.mbox/%3c2c9597b90710090458v44aa7c07q48e5bee3898a582c@mail.gmail.com%3e

>   
>> Regards,
>> Oliver
>>
>>     
>>> A good news is that icu3.8 providers a data customization
>>> tool[1], I've tried it and reported a failure when customizing icu4j.
>>>
>>> [1]http://apps.icu-project.org/datacustom/
>>>
>>>
>>> On 10/2/07, Oliver Deakin <ol...@googlemail.com> wrote:
>>>
>>>       
>>>> Hi Alexei,
>>>>
>>>> Yes, performance is an issue here. I would envisage that for
>>>> small/medium icu jobs we will see a performance increase due to calls
>>>> into icu C code via jni adding an overhead. For larger conversion jobs
>>>> we may find that icu4c/icu4jni provide better results. Clearly this is
>>>> something that needs to be weighed up against the pros of moving to
>>>> using a purely Java implementation. Are there any tests you might
>>>> suggest to make a performance comparison?
>>>>
>>>> I see Tony has spoken to the icu developers before about this issue [1].
>>>> Do you have any input Tony?
>>>>
>>>> Regards,
>>>> Oliver
>>>>
>>>> [1] http://www.nabble.com/From-icu4jni-to-icu4j-t3543140.html
>>>>
>>>> Alexei Zakharov wrote:
>>>>
>>>>         
>>>>> Hi Oliver and all,
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> The first thing is that icu4jni is no longer supported from this release
>>>>>> onwards. The icu4jni api have been incorporated into icu4j and are
>>>>>> implemented in pure Java now.
>>>>>>
>>>>>>
>>>>>>             
>>>>>> Secondly, the Bidi class has also been implemented fully in icu4j now,
>>>>>> so it is possible for us to also drop icu4c as a dependency and use pure
>>>>>> icu4j for this functionality.
>>>>>>
>>>>>>
>>>>>>             
>>>>> I have no objections. Only like to be sure we won't get significant
>>>>> performance degradation by moving from native implementation to pure
>>>>> Java.
>>>>>
>>>>> Thanks,
>>>>> Alexei
>>>>>
>>>>> 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
>>>>>
>>>>>
>>>>>           
>>>>>> Hi all,
>>>>>>
>>>>>> I have been looking recently at what it would take for us to step up to
>>>>>> icu4j 3.8 and thought I would give everyone a heads up on what I have
>>>>>> discovered.
>>>>>>
>>>>>> The first thing is that icu4jni is no longer supported from this release
>>>>>> onwards. The icu4jni api have been incorporated into icu4j and are
>>>>>> implemented in pure Java now.
>>>>>> Secondly, the Bidi class has also been implemented fully in icu4j now,
>>>>>> so it is possible for us to also drop icu4c as a dependency and use pure
>>>>>> icu4j for this functionality.
>>>>>>
>>>>>> The major advantage I see of moving to pure icu4j 3.8 is that we no
>>>>>> longer need to maintain prebuilt binaries of the icu4c and icu4jni
>>>>>> libraries across all platforms in our repository. This simplifies the
>>>>>> process of upgrading to new versions of icu and also allows us to move
>>>>>> to new platforms with greater ease.
>>>>>>
>>>>>> I am currently testing a patch to switch over to icu 3.8 and completely
>>>>>> remove the need for icu4c/jni. I have discovered a couple of bugs in the
>>>>>> new Bidi functionality [1] which I have raised on the icu dev list and
>>>>>> are in the process of being fixed. I hope that once they are all
>>>>>> resolved we will be able to pick up a patched icu4j 3.8 jar for our use.
>>>>>>
>>>>>> Im interested to hear if anyone has any comments/objections to this?
>>>>>>
>>>>>> Regards,
>>>>>> Oliver
>>>>>>
>>>>>> [1]
>>>>>> http://bugs.icu-project.org/trac/ticket/5952
>>>>>> http://bugs.icu-project.org/trac/ticket/5961
>>>>>>
>>>>>>
>>>>>>             
>>>> --
>>>> Oliver Deakin
>>>> Unless stated otherwise above:
>>>> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
>>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>>>
>>>>
>>>>
>>>>         
>>>
>>>       
>> --
>> Oliver Deakin
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>>
>>     
>
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tony Wu <wu...@gmail.com>.
On 10/8/07, Oliver Deakin <ol...@googlemail.com> wrote:
> Tony Wu wrote:
> > Sorry for reply late, just recover from national holiday.
> >
>
> No problem :)
>
> > About the performance issue, native code is faster but jni call is
> > heavy. So, icu4j 3.4 is good at encoding/decoding several bytes
> > whereas icu4jni3.4 is good at handling thousands bytes. Bidi is
> > another story, I'll try to compare the native impl and java impl later
> > in 3.8.
>
> Thanks Tony - it will be interesting to know how pure Java ICU performs
> within Harmony compared to the current setup. If it helps I can open a
> JIRA for moving to ICU4J 3.8 and attach a basic patch to eliminate the
> need for icu4jni/icu4c to get you started.
ok. we may need change the code in both DRLVM and j9 since the lib of
icu4c is directly refered by them.

> Are there any particular
> benchmarks you had in mind for this?
>
ya, there is a micro benchmark on HARMONY-3709

> > Anyway, it is worth to do some work to remove the 10m
> > dependency(icu4c).
> >
> > BTW, We keep some resource bundle classes in luni, such as Locale and
> > Currency, which used by luni and text module. These data are aslo
> > included in icu, I suggest to remove this overlap, just keep one of
> > them.
>
> Agreed - if we can use the ICU version of these resources then IMHO we
> should do it.
>
I've successfully changed the data in luni but got some problem in
text, since the organization in resource bundle for locale is
different from each other. And unfortunately there is no doc in
current harmony impl, I need some time to try and guess them.

> Regards,
> Oliver
>
> > A good news is that icu3.8 providers a data customization
> > tool[1], I've tried it and reported a failure when customizing icu4j.
> >
> > [1]http://apps.icu-project.org/datacustom/
> >
> >
> > On 10/2/07, Oliver Deakin <ol...@googlemail.com> wrote:
> >
> >> Hi Alexei,
> >>
> >> Yes, performance is an issue here. I would envisage that for
> >> small/medium icu jobs we will see a performance increase due to calls
> >> into icu C code via jni adding an overhead. For larger conversion jobs
> >> we may find that icu4c/icu4jni provide better results. Clearly this is
> >> something that needs to be weighed up against the pros of moving to
> >> using a purely Java implementation. Are there any tests you might
> >> suggest to make a performance comparison?
> >>
> >> I see Tony has spoken to the icu developers before about this issue [1].
> >> Do you have any input Tony?
> >>
> >> Regards,
> >> Oliver
> >>
> >> [1] http://www.nabble.com/From-icu4jni-to-icu4j-t3543140.html
> >>
> >> Alexei Zakharov wrote:
> >>
> >>> Hi Oliver and all,
> >>>
> >>>
> >>>
> >>>> The first thing is that icu4jni is no longer supported from this release
> >>>> onwards. The icu4jni api have been incorporated into icu4j and are
> >>>> implemented in pure Java now.
> >>>>
> >>>>
> >>>
> >>>> Secondly, the Bidi class has also been implemented fully in icu4j now,
> >>>> so it is possible for us to also drop icu4c as a dependency and use pure
> >>>> icu4j for this functionality.
> >>>>
> >>>>
> >>> I have no objections. Only like to be sure we won't get significant
> >>> performance degradation by moving from native implementation to pure
> >>> Java.
> >>>
> >>> Thanks,
> >>> Alexei
> >>>
> >>> 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
> >>>
> >>>
> >>>> Hi all,
> >>>>
> >>>> I have been looking recently at what it would take for us to step up to
> >>>> icu4j 3.8 and thought I would give everyone a heads up on what I have
> >>>> discovered.
> >>>>
> >>>> The first thing is that icu4jni is no longer supported from this release
> >>>> onwards. The icu4jni api have been incorporated into icu4j and are
> >>>> implemented in pure Java now.
> >>>> Secondly, the Bidi class has also been implemented fully in icu4j now,
> >>>> so it is possible for us to also drop icu4c as a dependency and use pure
> >>>> icu4j for this functionality.
> >>>>
> >>>> The major advantage I see of moving to pure icu4j 3.8 is that we no
> >>>> longer need to maintain prebuilt binaries of the icu4c and icu4jni
> >>>> libraries across all platforms in our repository. This simplifies the
> >>>> process of upgrading to new versions of icu and also allows us to move
> >>>> to new platforms with greater ease.
> >>>>
> >>>> I am currently testing a patch to switch over to icu 3.8 and completely
> >>>> remove the need for icu4c/jni. I have discovered a couple of bugs in the
> >>>> new Bidi functionality [1] which I have raised on the icu dev list and
> >>>> are in the process of being fixed. I hope that once they are all
> >>>> resolved we will be able to pick up a patched icu4j 3.8 jar for our use.
> >>>>
> >>>> Im interested to hear if anyone has any comments/objections to this?
> >>>>
> >>>> Regards,
> >>>> Oliver
> >>>>
> >>>> [1]
> >>>> http://bugs.icu-project.org/trac/ticket/5952
> >>>> http://bugs.icu-project.org/trac/ticket/5961
> >>>>
> >>>>
> >>>
> >> --
> >> Oliver Deakin
> >> Unless stated otherwise above:
> >> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> >>
> >>
> >>
> >
> >
> >
>
> --
> Oliver Deakin
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>


-- 
Tony Wu
China Software Development Lab, IBM

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Tony Wu wrote:
> Sorry for reply late, just recover from national holiday.
>   

No problem :)

> About the performance issue, native code is faster but jni call is
> heavy. So, icu4j 3.4 is good at encoding/decoding several bytes
> whereas icu4jni3.4 is good at handling thousands bytes. Bidi is
> another story, I'll try to compare the native impl and java impl later
> in 3.8.  

Thanks Tony - it will be interesting to know how pure Java ICU performs 
within Harmony compared to the current setup. If it helps I can open a 
JIRA for moving to ICU4J 3.8 and attach a basic patch to eliminate the 
need for icu4jni/icu4c to get you started. Are there any particular 
benchmarks you had in mind for this?

> Anyway, it is worth to do some work to remove the 10m
> dependency(icu4c).
>
> BTW, We keep some resource bundle classes in luni, such as Locale and
> Currency, which used by luni and text module. These data are aslo
> included in icu, I suggest to remove this overlap, just keep one of
> them. 

Agreed - if we can use the ICU version of these resources then IMHO we 
should do it.

Regards,
Oliver

> A good news is that icu3.8 providers a data customization
> tool[1], I've tried it and reported a failure when customizing icu4j.
>
> [1]http://apps.icu-project.org/datacustom/
>
>
> On 10/2/07, Oliver Deakin <ol...@googlemail.com> wrote:
>   
>> Hi Alexei,
>>
>> Yes, performance is an issue here. I would envisage that for
>> small/medium icu jobs we will see a performance increase due to calls
>> into icu C code via jni adding an overhead. For larger conversion jobs
>> we may find that icu4c/icu4jni provide better results. Clearly this is
>> something that needs to be weighed up against the pros of moving to
>> using a purely Java implementation. Are there any tests you might
>> suggest to make a performance comparison?
>>
>> I see Tony has spoken to the icu developers before about this issue [1].
>> Do you have any input Tony?
>>
>> Regards,
>> Oliver
>>
>> [1] http://www.nabble.com/From-icu4jni-to-icu4j-t3543140.html
>>
>> Alexei Zakharov wrote:
>>     
>>> Hi Oliver and all,
>>>
>>>
>>>       
>>>> The first thing is that icu4jni is no longer supported from this release
>>>> onwards. The icu4jni api have been incorporated into icu4j and are
>>>> implemented in pure Java now.
>>>>
>>>>         
>>>       
>>>> Secondly, the Bidi class has also been implemented fully in icu4j now,
>>>> so it is possible for us to also drop icu4c as a dependency and use pure
>>>> icu4j for this functionality.
>>>>
>>>>         
>>> I have no objections. Only like to be sure we won't get significant
>>> performance degradation by moving from native implementation to pure
>>> Java.
>>>
>>> Thanks,
>>> Alexei
>>>
>>> 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
>>>
>>>       
>>>> Hi all,
>>>>
>>>> I have been looking recently at what it would take for us to step up to
>>>> icu4j 3.8 and thought I would give everyone a heads up on what I have
>>>> discovered.
>>>>
>>>> The first thing is that icu4jni is no longer supported from this release
>>>> onwards. The icu4jni api have been incorporated into icu4j and are
>>>> implemented in pure Java now.
>>>> Secondly, the Bidi class has also been implemented fully in icu4j now,
>>>> so it is possible for us to also drop icu4c as a dependency and use pure
>>>> icu4j for this functionality.
>>>>
>>>> The major advantage I see of moving to pure icu4j 3.8 is that we no
>>>> longer need to maintain prebuilt binaries of the icu4c and icu4jni
>>>> libraries across all platforms in our repository. This simplifies the
>>>> process of upgrading to new versions of icu and also allows us to move
>>>> to new platforms with greater ease.
>>>>
>>>> I am currently testing a patch to switch over to icu 3.8 and completely
>>>> remove the need for icu4c/jni. I have discovered a couple of bugs in the
>>>> new Bidi functionality [1] which I have raised on the icu dev list and
>>>> are in the process of being fixed. I hope that once they are all
>>>> resolved we will be able to pick up a patched icu4j 3.8 jar for our use.
>>>>
>>>> Im interested to hear if anyone has any comments/objections to this?
>>>>
>>>> Regards,
>>>> Oliver
>>>>
>>>> [1]
>>>> http://bugs.icu-project.org/trac/ticket/5952
>>>> http://bugs.icu-project.org/trac/ticket/5961
>>>>
>>>>         
>>>       
>> --
>> Oliver Deakin
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>>
>>     
>
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tony Wu <wu...@gmail.com>.
Sorry for reply late, just recover from national holiday.

About the performance issue, native code is faster but jni call is
heavy. So, icu4j 3.4 is good at encoding/decoding several bytes
whereas icu4jni3.4 is good at handling thousands bytes. Bidi is
another story, I'll try to compare the native impl and java impl later
in 3.8.  Anyway, it is worth to do some work to remove the 10m
dependency(icu4c).

BTW, We keep some resource bundle classes in luni, such as Locale and
Currency, which used by luni and text module. These data are aslo
included in icu, I suggest to remove this overlap, just keep one of
them. A good news is that icu3.8 providers a data customization
tool[1], I've tried it and reported a failure when customizing icu4j.

[1]http://apps.icu-project.org/datacustom/


On 10/2/07, Oliver Deakin <ol...@googlemail.com> wrote:
> Hi Alexei,
>
> Yes, performance is an issue here. I would envisage that for
> small/medium icu jobs we will see a performance increase due to calls
> into icu C code via jni adding an overhead. For larger conversion jobs
> we may find that icu4c/icu4jni provide better results. Clearly this is
> something that needs to be weighed up against the pros of moving to
> using a purely Java implementation. Are there any tests you might
> suggest to make a performance comparison?
>
> I see Tony has spoken to the icu developers before about this issue [1].
> Do you have any input Tony?
>
> Regards,
> Oliver
>
> [1] http://www.nabble.com/From-icu4jni-to-icu4j-t3543140.html
>
> Alexei Zakharov wrote:
> > Hi Oliver and all,
> >
> >
> >> The first thing is that icu4jni is no longer supported from this release
> >> onwards. The icu4jni api have been incorporated into icu4j and are
> >> implemented in pure Java now.
> >>
> >
> >
> >> Secondly, the Bidi class has also been implemented fully in icu4j now,
> >> so it is possible for us to also drop icu4c as a dependency and use pure
> >> icu4j for this functionality.
> >>
> >
> > I have no objections. Only like to be sure we won't get significant
> > performance degradation by moving from native implementation to pure
> > Java.
> >
> > Thanks,
> > Alexei
> >
> > 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
> >
> >> Hi all,
> >>
> >> I have been looking recently at what it would take for us to step up to
> >> icu4j 3.8 and thought I would give everyone a heads up on what I have
> >> discovered.
> >>
> >> The first thing is that icu4jni is no longer supported from this release
> >> onwards. The icu4jni api have been incorporated into icu4j and are
> >> implemented in pure Java now.
> >> Secondly, the Bidi class has also been implemented fully in icu4j now,
> >> so it is possible for us to also drop icu4c as a dependency and use pure
> >> icu4j for this functionality.
> >>
> >> The major advantage I see of moving to pure icu4j 3.8 is that we no
> >> longer need to maintain prebuilt binaries of the icu4c and icu4jni
> >> libraries across all platforms in our repository. This simplifies the
> >> process of upgrading to new versions of icu and also allows us to move
> >> to new platforms with greater ease.
> >>
> >> I am currently testing a patch to switch over to icu 3.8 and completely
> >> remove the need for icu4c/jni. I have discovered a couple of bugs in the
> >> new Bidi functionality [1] which I have raised on the icu dev list and
> >> are in the process of being fixed. I hope that once they are all
> >> resolved we will be able to pick up a patched icu4j 3.8 jar for our use.
> >>
> >> Im interested to hear if anyone has any comments/objections to this?
> >>
> >> Regards,
> >> Oliver
> >>
> >> [1]
> >> http://bugs.icu-project.org/trac/ticket/5952
> >> http://bugs.icu-project.org/trac/ticket/5961
> >>
> >
> >
>
> --
> Oliver Deakin
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>


-- 
Tony Wu
China Software Development Lab, IBM

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Hi Alexei,

Yes, performance is an issue here. I would envisage that for 
small/medium icu jobs we will see a performance increase due to calls 
into icu C code via jni adding an overhead. For larger conversion jobs 
we may find that icu4c/icu4jni provide better results. Clearly this is 
something that needs to be weighed up against the pros of moving to 
using a purely Java implementation. Are there any tests you might 
suggest to make a performance comparison?

I see Tony has spoken to the icu developers before about this issue [1]. 
Do you have any input Tony?

Regards,
Oliver

[1] http://www.nabble.com/From-icu4jni-to-icu4j-t3543140.html

Alexei Zakharov wrote:
> Hi Oliver and all,
>
>   
>> The first thing is that icu4jni is no longer supported from this release
>> onwards. The icu4jni api have been incorporated into icu4j and are
>> implemented in pure Java now.
>>     
>
>   
>> Secondly, the Bidi class has also been implemented fully in icu4j now,
>> so it is possible for us to also drop icu4c as a dependency and use pure
>> icu4j for this functionality.
>>     
>
> I have no objections. Only like to be sure we won't get significant
> performance degradation by moving from native implementation to pure
> Java.
>
> Thanks,
> Alexei
>
> 2007/10/1, Oliver Deakin <ol...@googlemail.com>:
>   
>> Hi all,
>>
>> I have been looking recently at what it would take for us to step up to
>> icu4j 3.8 and thought I would give everyone a heads up on what I have
>> discovered.
>>
>> The first thing is that icu4jni is no longer supported from this release
>> onwards. The icu4jni api have been incorporated into icu4j and are
>> implemented in pure Java now.
>> Secondly, the Bidi class has also been implemented fully in icu4j now,
>> so it is possible for us to also drop icu4c as a dependency and use pure
>> icu4j for this functionality.
>>
>> The major advantage I see of moving to pure icu4j 3.8 is that we no
>> longer need to maintain prebuilt binaries of the icu4c and icu4jni
>> libraries across all platforms in our repository. This simplifies the
>> process of upgrading to new versions of icu and also allows us to move
>> to new platforms with greater ease.
>>
>> I am currently testing a patch to switch over to icu 3.8 and completely
>> remove the need for icu4c/jni. I have discovered a couple of bugs in the
>> new Bidi functionality [1] which I have raised on the icu dev list and
>> are in the process of being fixed. I hope that once they are all
>> resolved we will be able to pick up a patched icu4j 3.8 jar for our use.
>>
>> Im interested to hear if anyone has any comments/objections to this?
>>
>> Regards,
>> Oliver
>>
>> [1]
>> http://bugs.icu-project.org/trac/ticket/5952
>> http://bugs.icu-project.org/trac/ticket/5961
>>     
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Alexei Zakharov <al...@gmail.com>.
Hi Oliver and all,

> The first thing is that icu4jni is no longer supported from this release
> onwards. The icu4jni api have been incorporated into icu4j and are
> implemented in pure Java now.

> Secondly, the Bidi class has also been implemented fully in icu4j now,
> so it is possible for us to also drop icu4c as a dependency and use pure
> icu4j for this functionality.

I have no objections. Only like to be sure we won't get significant
performance degradation by moving from native implementation to pure
Java.

Thanks,
Alexei

2007/10/1, Oliver Deakin <ol...@googlemail.com>:
> Hi all,
>
> I have been looking recently at what it would take for us to step up to
> icu4j 3.8 and thought I would give everyone a heads up on what I have
> discovered.
>
> The first thing is that icu4jni is no longer supported from this release
> onwards. The icu4jni api have been incorporated into icu4j and are
> implemented in pure Java now.
> Secondly, the Bidi class has also been implemented fully in icu4j now,
> so it is possible for us to also drop icu4c as a dependency and use pure
> icu4j for this functionality.
>
> The major advantage I see of moving to pure icu4j 3.8 is that we no
> longer need to maintain prebuilt binaries of the icu4c and icu4jni
> libraries across all platforms in our repository. This simplifies the
> process of upgrading to new versions of icu and also allows us to move
> to new platforms with greater ease.
>
> I am currently testing a patch to switch over to icu 3.8 and completely
> remove the need for icu4c/jni. I have discovered a couple of bugs in the
> new Bidi functionality [1] which I have raised on the icu dev list and
> are in the process of being fixed. I hope that once they are all
> resolved we will be able to pick up a patched icu4j 3.8 jar for our use.
>
> Im interested to hear if anyone has any comments/objections to this?
>
> Regards,
> Oliver
>
> [1]
> http://bugs.icu-project.org/trac/ticket/5952
> http://bugs.icu-project.org/trac/ticket/5961

Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Tim Ellison wrote:
> Again, sorry for the late response...
>
> Oliver Deakin wrote:
>   
>> Further on upgrading to ICU4J 3.8, when I run the text tests I see
>> failures in BidiTest - namely: testCreateLineBidi;
>> testCreateLineBidiInvalid; testGetRunLimit. For the Bidi scenarios in
>> these tests ICU throws IllegalArgumentExceptions due to invalid
>> parameters being passed through, whereas the RI ignores the fact that
>> these parameters are illegal (one of the differences has been discussed
>> previously in [1]). More precisely, the tests [2] and [3] throw
>> exceptions on ICU 3.8 but complete successfully on the RI.
>>
>> I have been in conversation with the Bidi developer and it seems that
>> ICU is keeping in line with the spec while the RI is allowing illegal
>> cases. The developer I have been talking to has asked if I feel these
>> differences should be fixed in ICU, so I thought I would throw this
>> question out to the Harmony community as it will be something that
>> affects the behaviour of our Bidi class if we move to ICU4j 3.8. IMHO it
>> is not a problem to follow the spec and differ from the RI, as ICU
>> currently does, in these invalid cases. Does anyone object to this?
>>     
>
> I think we follow usual procedure here, which is to follow the spec if
> the spec is being more reasonable than the RI, update our tests, and
> list them as non-bug differences in JIRA.
>   

Exactly what I was thinking - these cases all follow the spec correctly, 
while the RI does not. I was planning to raise them as non-bug 
differences when I carried out the changes to move to icu 3.8

> If we find key applications that rely upon the silent-ignore behavior
> then we might change our mind and depart from the spec, considering it a
> de facto update.
>
>   

Agreed - it seems that the ICU developers would be happy to help us out 
here by making the changes at their end if we had a good enough reason to.

Regards,
Oliver

> Regards,
> Tim
>
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Tim Ellison <t....@gmail.com>.
Again, sorry for the late response...

Oliver Deakin wrote:
> Further on upgrading to ICU4J 3.8, when I run the text tests I see
> failures in BidiTest - namely: testCreateLineBidi;
> testCreateLineBidiInvalid; testGetRunLimit. For the Bidi scenarios in
> these tests ICU throws IllegalArgumentExceptions due to invalid
> parameters being passed through, whereas the RI ignores the fact that
> these parameters are illegal (one of the differences has been discussed
> previously in [1]). More precisely, the tests [2] and [3] throw
> exceptions on ICU 3.8 but complete successfully on the RI.
> 
> I have been in conversation with the Bidi developer and it seems that
> ICU is keeping in line with the spec while the RI is allowing illegal
> cases. The developer I have been talking to has asked if I feel these
> differences should be fixed in ICU, so I thought I would throw this
> question out to the Harmony community as it will be something that
> affects the behaviour of our Bidi class if we move to ICU4j 3.8. IMHO it
> is not a problem to follow the spec and differ from the RI, as ICU
> currently does, in these invalid cases. Does anyone object to this?

I think we follow usual procedure here, which is to follow the spec if
the spec is being more reasonable than the RI, update our tests, and
list them as non-bug differences in JIRA.

If we find key applications that rely upon the silent-ignore behavior
then we might change our mind and depart from the spec, considering it a
de facto update.

Regards,
Tim


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
I forgot to link the ICU bugs I raised for these issues:
http://bugs.icu-project.org/trac/ticket/5973
http://bugs.icu-project.org/trac/ticket/5974

Regards,
Oliver

Oliver Deakin wrote:
> Further on upgrading to ICU4J 3.8, when I run the text tests I see 
> failures in BidiTest - namely: testCreateLineBidi; 
> testCreateLineBidiInvalid; testGetRunLimit. For the Bidi scenarios in 
> these tests ICU throws IllegalArgumentExceptions due to invalid 
> parameters being passed through, whereas the RI ignores the fact that 
> these parameters are illegal (one of the differences has been 
> discussed previously in [1]). More precisely, the tests [2] and [3] 
> throw exceptions on ICU 3.8 but complete successfully on the RI.
>
> I have been in conversation with the Bidi developer and it seems that 
> ICU is keeping in line with the spec while the RI is allowing illegal 
> cases. The developer I have been talking to has asked if I feel these 
> differences should be fixed in ICU, so I thought I would throw this 
> question out to the Harmony community as it will be something that 
> affects the behaviour of our Bidi class if we move to ICU4j 3.8. IMHO 
> it is not a problem to follow the spec and differ from the RI, as ICU 
> currently does, in these invalid cases. Does anyone object to this?
>
> Regards,
> Oliver
>
>
> [1] https://issues.apache.org/jira/browse/HARMONY-649
>
> [2] This test requests an illegal run limit value. Only values between 
> 0 (included) and run count (excluded) are valid. Since the run count 
> in this case is 1, it is only valid to request the 0th run with 
> getRunLimit. The RI ignores this invalid request while ICU throws 
> IllegalArgumentException
> public class Testa {
> public static void main(String[] args) throws Throwable {
> Bidi bidi = new Bidi("text", Bidi.DIRECTION_LEFT_TO_RIGHT);
> bidi.getRunLimit(1);
> }
> }
>
> [3] We try to create an invalid Line Bidi containing a \n character. 
> The RI ignores the fact this Line Bidi is invalid while ICU throws 
> IllegalArgumentException.
> public class Testb {
> public static void main(String[] args) throws Throwable {
> Bidi bd = new Bidi("a\u05D0a\na\u05D0\"\u05D0a".toCharArray(), 0, new 
> byte[] { 0, 0, 0, -3, -3, 2, 2, 0, 3 }, 0, 9, 
> Bidi.DIRECTION_RIGHT_TO_LEFT);
> Bidi line = bd.createLineBidi(2, 7);
> }
> }
>
> Oliver Deakin wrote:
>> Hi all,
>>
>> I have been looking recently at what it would take for us to step up 
>> to icu4j 3.8 and thought I would give everyone a heads up on what I 
>> have discovered.
>>
>> The first thing is that icu4jni is no longer supported from this 
>> release onwards. The icu4jni api have been incorporated into icu4j 
>> and are implemented in pure Java now.
>> Secondly, the Bidi class has also been implemented fully in icu4j 
>> now, so it is possible for us to also drop icu4c as a dependency and 
>> use pure icu4j for this functionality.
>>
>> The major advantage I see of moving to pure icu4j 3.8 is that we no 
>> longer need to maintain prebuilt binaries of the icu4c and icu4jni 
>> libraries across all platforms in our repository. This simplifies the 
>> process of upgrading to new versions of icu and also allows us to 
>> move to new platforms with greater ease.
>>
>> I am currently testing a patch to switch over to icu 3.8 and 
>> completely remove the need for icu4c/jni. I have discovered a couple 
>> of bugs in the new Bidi functionality [1] which I have raised on the 
>> icu dev list and are in the process of being fixed. I hope that once 
>> they are all resolved we will be able to pick up a patched icu4j 3.8 
>> jar for our use.
>>
>> Im interested to hear if anyone has any comments/objections to this?
>>
>> Regards,
>> Oliver
>>
>> [1]
>> http://bugs.icu-project.org/trac/ticket/5952
>> http://bugs.icu-project.org/trac/ticket/5961
>>
>

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [classlib][icu] Bringing ICU level up to 3.8

Posted by Oliver Deakin <ol...@googlemail.com>.
Further on upgrading to ICU4J 3.8, when I run the text tests I see 
failures in BidiTest - namely: testCreateLineBidi; 
testCreateLineBidiInvalid; testGetRunLimit. For the Bidi scenarios in 
these tests ICU throws IllegalArgumentExceptions due to invalid 
parameters being passed through, whereas the RI ignores the fact that 
these parameters are illegal (one of the differences has been discussed 
previously in [1]). More precisely, the tests [2] and [3] throw 
exceptions on ICU 3.8 but complete successfully on the RI.

I have been in conversation with the Bidi developer and it seems that 
ICU is keeping in line with the spec while the RI is allowing illegal 
cases. The developer I have been talking to has asked if I feel these 
differences should be fixed in ICU, so I thought I would throw this 
question out to the Harmony community as it will be something that 
affects the behaviour of our Bidi class if we move to ICU4j 3.8. IMHO it 
is not a problem to follow the spec and differ from the RI, as ICU 
currently does, in these invalid cases. Does anyone object to this?

Regards,
Oliver


[1] https://issues.apache.org/jira/browse/HARMONY-649

[2] This test requests an illegal run limit value. Only values between 0 
(included) and run count (excluded) are valid. Since the run count in 
this case is 1, it is only valid to request the 0th run with 
getRunLimit. The RI ignores this invalid request while ICU throws 
IllegalArgumentException
public class Testa {
public static void main(String[] args) throws Throwable {
Bidi bidi = new Bidi("text", Bidi.DIRECTION_LEFT_TO_RIGHT);
bidi.getRunLimit(1);
}
}

[3] We try to create an invalid Line Bidi containing a \n character. The 
RI ignores the fact this Line Bidi is invalid while ICU throws 
IllegalArgumentException.
public class Testb {
public static void main(String[] args) throws Throwable {
Bidi bd = new Bidi("a\u05D0a\na\u05D0\"\u05D0a".toCharArray(), 0, new 
byte[] { 0, 0, 0, -3, -3, 2, 2, 0, 3 }, 0, 9, 
Bidi.DIRECTION_RIGHT_TO_LEFT);
Bidi line = bd.createLineBidi(2, 7);
}
}

Oliver Deakin wrote:
> Hi all,
>
> I have been looking recently at what it would take for us to step up 
> to icu4j 3.8 and thought I would give everyone a heads up on what I 
> have discovered.
>
> The first thing is that icu4jni is no longer supported from this 
> release onwards. The icu4jni api have been incorporated into icu4j and 
> are implemented in pure Java now.
> Secondly, the Bidi class has also been implemented fully in icu4j now, 
> so it is possible for us to also drop icu4c as a dependency and use 
> pure icu4j for this functionality.
>
> The major advantage I see of moving to pure icu4j 3.8 is that we no 
> longer need to maintain prebuilt binaries of the icu4c and icu4jni 
> libraries across all platforms in our repository. This simplifies the 
> process of upgrading to new versions of icu and also allows us to move 
> to new platforms with greater ease.
>
> I am currently testing a patch to switch over to icu 3.8 and 
> completely remove the need for icu4c/jni. I have discovered a couple 
> of bugs in the new Bidi functionality [1] which I have raised on the 
> icu dev list and are in the process of being fixed. I hope that once 
> they are all resolved we will be able to pick up a patched icu4j 3.8 
> jar for our use.
>
> Im interested to hear if anyone has any comments/objections to this?
>
> Regards,
> Oliver
>
> [1]
> http://bugs.icu-project.org/trac/ticket/5952
> http://bugs.icu-project.org/trac/ticket/5961
>

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU