You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Matthew Pocock <tu...@gmail.com> on 2011/08/11 20:38:26 UTC

[codec] getting the bmpm code out there

Hi,

As those of you who've been following the CODEC-125 ticket will know, with
Greg's help I've got a port of the beider morse phonetic
matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
ready for people to use and abuse. It ideally needs more test-case words,
but to the best of my knowledge it doesn't have any horrendous bugs or
performance issues.

The discussion on the ticket started to stray off bmpm and on to policy for
releases and changing APIs, and Sebb said we should discuss it on the list.
So, here we are.

Ideally, I'd like there to be a release of commons-codec some time soon so
that users can start to try out bmpm right away, and so that we can start
the process of adding it to the list of supported indexing methods in solr.
What do people think?

Matthew

-- 
Dr Matthew Pocock
Visitor, School of Computing Science, Newcastle University
mailto: turingatemyhamster@gmail.com
gchat: turingatemyhamster@gmail.com
msn: matthew_pocock@yahoo.co.uk
irc.freenode.net: drdozer
tel: (0191) 2566550
mob: +447535664143

Re: [codec] getting the bmpm code out there

Posted by Gary Gregory <ga...@gmail.com>.
On Thu, Aug 11, 2011 at 4:10 PM, sebb <se...@gmail.com> wrote:
> On 11 August 2011 20:56, Gary Gregory <ga...@gmail.com> wrote:
>> Hello All!
>>
>> Topic 1: Housekeeping: package name and POM.
>>
>> The next codec release out of trunk will be major release labeled 2.0,
>> the current release is 1.5.
>>
>> In trunk, I've removed deprecated methods and the project now requires
>> Java 5. This means 2.0 will not be a drop-in binary compatible release
>> for 1.5.
>>
>> I'd like to confirm or deny that this means the package name will
>> change to o.a.c.codec2 and that the POM groupId will have to change
>> from commons-codec to org.apache.commons. 2.0 and 1.5 would be able to
>> live side by side.
>
> Yes, the name changes are necessary to avoid problems with incompatible jars.

Ok, I'll do that tonight.

Gary

>
>> I'd like to get this out of the way first hence topic 1.
>>
>>
>> Topic 2: Beider-Morse (BM) Encoder API
>> https://issues.apache.org/jira/browse/CODEC-125
>>
>> BM is a new codec for 2.0.
>>
>> The encode API returns a set of encodings.
>>
>> In trunk, this is currently a String in the format "s1|s2|s3".
>>
>> I think this is not the best design, a set should be a Set, in this
>> case, an ordered set. Or, a List. Generally, it should be a Collection
>> of Strings.
>>
>> There was concern with call sites that generically use a [codec]
>> Encoder with the signature "Object encoder(Object)" and call
>> toString() on the result.
>>
>> If we set the API to "CharSequence encode(Set<CharSequence>)" or
>> "String encode(Set<String>)", doing a toString() on a HashSet will
>> yield a usable String similar as to what trunk does now. For example,
>> for a HashSet of Strings "a", "b" and "c", HashSet.toString() returns
>> "[a, b, c]" which no worse than "a|b|c" IMO. At least it is a
>> documented and stable format.
>
> +1
>
>> Topic 3: Generics
>>
>> This will be in a separate thread but I'd like to get this in 2.0
>> because this will likely break the API and I only want to break things
>> once and not have to do a codec3 for generics.
>
> +1.
>
>> Thank you all,
>> Gary
>>
>> On Thu, Aug 11, 2011 at 2:38 PM, Matthew Pocock
>> <tu...@gmail.com> wrote:
>>> Hi,
>>>
>>> As those of you who've been following the CODEC-125 ticket will know, with
>>> Greg's help I've got a port of the beider morse phonetic
>>> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
>>> ready for people to use and abuse. It ideally needs more test-case words,
>>> but to the best of my knowledge it doesn't have any horrendous bugs or
>>> performance issues.
>>>
>>> The discussion on the ticket started to stray off bmpm and on to policy for
>>> releases and changing APIs, and Sebb said we should discuss it on the list.
>>> So, here we are.
>>>
>>> Ideally, I'd like there to be a release of commons-codec some time soon so
>>> that users can start to try out bmpm right away, and so that we can start
>>> the process of adding it to the list of supported indexing methods in solr.
>>> What do people think?
>>>
>>> Matthew
>>>
>>> --
>>> Dr Matthew Pocock
>>> Visitor, School of Computing Science, Newcastle University
>>> mailto: turingatemyhamster@gmail.com
>>> gchat: turingatemyhamster@gmail.com
>>> msn: matthew_pocock@yahoo.co.uk
>>> irc.freenode.net: drdozer
>>> tel: (0191) 2566550
>>> mob: +447535664143
>>>
>>
>>
>>
>> --
>> Thank you,
>> Gary
>>
>> http://garygregory.wordpress.com/
>> http://garygregory.com/
>> http://people.apache.org/~ggregory/
>> http://twitter.com/GaryGregory
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



-- 
Thank you,
Gary

http://garygregory.wordpress.com/
http://garygregory.com/
http://people.apache.org/~ggregory/
http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Gary Gregory <ga...@gmail.com>.
On Fri, Aug 12, 2011 at 11:42 AM, sebb <se...@gmail.com> wrote:
> On 12 August 2011 16:08, Gary Gregory <ga...@gmail.com> wrote:
>> On Fri, Aug 12, 2011 at 10:35 AM, sebb <se...@gmail.com> wrote:
>>> On 12 August 2011 15:29, Gary Gregory <ga...@gmail.com> wrote:
>>>> On Fri, Aug 12, 2011 at 9:54 AM, sebb <se...@gmail.com> wrote:
>>>>> On 12 August 2011 14:33, Gary Gregory <ga...@gmail.com> wrote:
>>>>>> Can we proceed like so?
>>>>>>
>>>>>> - I'll save my generified codec in an svn branch ASAP.
>>>>>> - we can discuss that and get the best design
>>>>>> - is it binary compatible?
>>>>>
>>>>> Or can it be made binary-compatible without excessive compromises?
>>>>
>>>> I think we should look at the generics branch (when it is there), make
>>>> it the best we can, and then consider what it means to binary
>>>> compatibility and if it is worth achieving.
>>>>
>>>>>
>>>>>> - if not, which is my current view, then package is codec2
>>>>>
>>>>> Whatever the final decision, it's much easier to keep the package name
>>>>> as codec until just before release.
>>>>
>>>> trunk is codec2 ATM based on the binary incompatibility, due to
>>>> removed deprecated methods and other changes (field made final for
>>>> example.)
>>>
>>> Yes, if released as binary incompat. it will have to change to codec2,
>>> but the package change can (and IMO should) be left until the last
>>> moment.
>>>
>>> As it stands now, it's much harder to check for binary compat (have to
>>> use shade before using clirr).
>>> Also, if it turns out to be possible to maintain binary compat, the
>>> name change will have to be reverted.
>>
>> Ok, I'll revert the codec2 change after I save my generics branch,
>> hopefully later today or this weekend.
>
> Again, it would be easier to evaluate the branch if it uses the same
> package name.

Yes, I'll do it that way :)

Gary

>
>> Gary
>>
>>>
>>>> Gary
>>>>
>>>>>
>>>>>> We have lang3 and digester3 under our belts now with new packages. Are
>>>>>> we going to change policy again? I hope not. We sure spent a lot of
>>>>>> time on this and thought we made a sane decision as a community.
>>>>>> Joda-time is its own world can do what it wants but I'd like to keep
>>>>>> my sanity in commons land with clear and consistent policies ;)
>>>>>
>>>>> It's not a change in policy; lang3 and digester3 are exceptions.
>>>>>
>>>>>> Wrt to removing deprecations, we can revisit each change one at a time
>>>>>> if someone cares to data mine svn for the age of each or whatever
>>>>>> metric you want.
>>>>>
>>>>> +1
>>>>>
>>>>>> Cheers to all and thank you for your time and constructive feedback :)
>>>>>>
>>>>>> Gary
>>>>>>
>>>>>> On Aug 12, 2011, at 6:31, Stephen Colebourne <sc...@joda.org> wrote:
>>>>>>
>>>>>>> On 12 August 2011 11:19, sebb <se...@gmail.com> wrote:
>>>>>>>>> - Removing deprecated methods does not require a package name change
>>>>>>>>
>>>>>>>> How so?
>>>>>>>>
>>>>>>>> If there are any external references to them in an application that
>>>>>>>> cannot be removed, then both old and new jars will need to be
>>>>>>>> deployed.
>>>>>>>> Which cannot be done safely in a single classloader (no guarantee
>>>>>>>> which instance of duplicated classes will be loaded).
>>>>>>>> AFAIK Maven prevents duplicates anyway.
>>>>>>>
>>>>>>> In Joda-Time v2.0 I removed some deprecated methods and left others in
>>>>>>> (no package name change). Those that I removed were methods that were
>>>>>>> deprecated for a very long time (probably4years+), with multiple later
>>>>>>> versions with the deprecation and easy alternates. Those that I did
>>>>>>> not remove were classes and methods that were probably still in use by
>>>>>>> people as they were once a primary API. This is a judgement call.
>>>>>>>
>>>>>>> And yes, removing a deprecated element means that another project that
>>>>>>> still uses the deprecation can no longer run. But if you've had
>>>>>>> something deprecated for over 3 years, that doesn't seem too harsh,
>>>>>>> unless it used to be a key/primary API.
>>>>>>>
>>>>>>> In hard cases, I'd rather see "NewFoo" of "Foo2" replacing "Foo"
>>>>>>> within the same package name, or a new sub-package within the same
>>>>>>> o.a.c.codec package space rather than o.a.c.codec2 for everything.
>>>>>>>
>>>>>>> Stephen
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thank you,
>>>> Gary
>>>>
>>>> http://garygregory.wordpress.com/
>>>> http://garygregory.com/
>>>> http://people.apache.org/~ggregory/
>>>> http://twitter.com/GaryGregory
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>>
>> --
>> Thank you,
>> Gary
>>
>> http://garygregory.wordpress.com/
>> http://garygregory.com/
>> http://people.apache.org/~ggregory/
>> http://twitter.com/GaryGregory
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



-- 
Thank you,
Gary

http://garygregory.wordpress.com/
http://garygregory.com/
http://people.apache.org/~ggregory/
http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by sebb <se...@gmail.com>.
On 12 August 2011 16:08, Gary Gregory <ga...@gmail.com> wrote:
> On Fri, Aug 12, 2011 at 10:35 AM, sebb <se...@gmail.com> wrote:
>> On 12 August 2011 15:29, Gary Gregory <ga...@gmail.com> wrote:
>>> On Fri, Aug 12, 2011 at 9:54 AM, sebb <se...@gmail.com> wrote:
>>>> On 12 August 2011 14:33, Gary Gregory <ga...@gmail.com> wrote:
>>>>> Can we proceed like so?
>>>>>
>>>>> - I'll save my generified codec in an svn branch ASAP.
>>>>> - we can discuss that and get the best design
>>>>> - is it binary compatible?
>>>>
>>>> Or can it be made binary-compatible without excessive compromises?
>>>
>>> I think we should look at the generics branch (when it is there), make
>>> it the best we can, and then consider what it means to binary
>>> compatibility and if it is worth achieving.
>>>
>>>>
>>>>> - if not, which is my current view, then package is codec2
>>>>
>>>> Whatever the final decision, it's much easier to keep the package name
>>>> as codec until just before release.
>>>
>>> trunk is codec2 ATM based on the binary incompatibility, due to
>>> removed deprecated methods and other changes (field made final for
>>> example.)
>>
>> Yes, if released as binary incompat. it will have to change to codec2,
>> but the package change can (and IMO should) be left until the last
>> moment.
>>
>> As it stands now, it's much harder to check for binary compat (have to
>> use shade before using clirr).
>> Also, if it turns out to be possible to maintain binary compat, the
>> name change will have to be reverted.
>
> Ok, I'll revert the codec2 change after I save my generics branch,
> hopefully later today or this weekend.

Again, it would be easier to evaluate the branch if it uses the same
package name.

> Gary
>
>>
>>> Gary
>>>
>>>>
>>>>> We have lang3 and digester3 under our belts now with new packages. Are
>>>>> we going to change policy again? I hope not. We sure spent a lot of
>>>>> time on this and thought we made a sane decision as a community.
>>>>> Joda-time is its own world can do what it wants but I'd like to keep
>>>>> my sanity in commons land with clear and consistent policies ;)
>>>>
>>>> It's not a change in policy; lang3 and digester3 are exceptions.
>>>>
>>>>> Wrt to removing deprecations, we can revisit each change one at a time
>>>>> if someone cares to data mine svn for the age of each or whatever
>>>>> metric you want.
>>>>
>>>> +1
>>>>
>>>>> Cheers to all and thank you for your time and constructive feedback :)
>>>>>
>>>>> Gary
>>>>>
>>>>> On Aug 12, 2011, at 6:31, Stephen Colebourne <sc...@joda.org> wrote:
>>>>>
>>>>>> On 12 August 2011 11:19, sebb <se...@gmail.com> wrote:
>>>>>>>> - Removing deprecated methods does not require a package name change
>>>>>>>
>>>>>>> How so?
>>>>>>>
>>>>>>> If there are any external references to them in an application that
>>>>>>> cannot be removed, then both old and new jars will need to be
>>>>>>> deployed.
>>>>>>> Which cannot be done safely in a single classloader (no guarantee
>>>>>>> which instance of duplicated classes will be loaded).
>>>>>>> AFAIK Maven prevents duplicates anyway.
>>>>>>
>>>>>> In Joda-Time v2.0 I removed some deprecated methods and left others in
>>>>>> (no package name change). Those that I removed were methods that were
>>>>>> deprecated for a very long time (probably4years+), with multiple later
>>>>>> versions with the deprecation and easy alternates. Those that I did
>>>>>> not remove were classes and methods that were probably still in use by
>>>>>> people as they were once a primary API. This is a judgement call.
>>>>>>
>>>>>> And yes, removing a deprecated element means that another project that
>>>>>> still uses the deprecation can no longer run. But if you've had
>>>>>> something deprecated for over 3 years, that doesn't seem too harsh,
>>>>>> unless it used to be a key/primary API.
>>>>>>
>>>>>> In hard cases, I'd rather see "NewFoo" of "Foo2" replacing "Foo"
>>>>>> within the same package name, or a new sub-package within the same
>>>>>> o.a.c.codec package space rather than o.a.c.codec2 for everything.
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Thank you,
>>> Gary
>>>
>>> http://garygregory.wordpress.com/
>>> http://garygregory.com/
>>> http://people.apache.org/~ggregory/
>>> http://twitter.com/GaryGregory
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
>
> --
> Thank you,
> Gary
>
> http://garygregory.wordpress.com/
> http://garygregory.com/
> http://people.apache.org/~ggregory/
> http://twitter.com/GaryGregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Gary Gregory <ga...@gmail.com>.
On Fri, Aug 12, 2011 at 10:35 AM, sebb <se...@gmail.com> wrote:
> On 12 August 2011 15:29, Gary Gregory <ga...@gmail.com> wrote:
>> On Fri, Aug 12, 2011 at 9:54 AM, sebb <se...@gmail.com> wrote:
>>> On 12 August 2011 14:33, Gary Gregory <ga...@gmail.com> wrote:
>>>> Can we proceed like so?
>>>>
>>>> - I'll save my generified codec in an svn branch ASAP.
>>>> - we can discuss that and get the best design
>>>> - is it binary compatible?
>>>
>>> Or can it be made binary-compatible without excessive compromises?
>>
>> I think we should look at the generics branch (when it is there), make
>> it the best we can, and then consider what it means to binary
>> compatibility and if it is worth achieving.
>>
>>>
>>>> - if not, which is my current view, then package is codec2
>>>
>>> Whatever the final decision, it's much easier to keep the package name
>>> as codec until just before release.
>>
>> trunk is codec2 ATM based on the binary incompatibility, due to
>> removed deprecated methods and other changes (field made final for
>> example.)
>
> Yes, if released as binary incompat. it will have to change to codec2,
> but the package change can (and IMO should) be left until the last
> moment.
>
> As it stands now, it's much harder to check for binary compat (have to
> use shade before using clirr).
> Also, if it turns out to be possible to maintain binary compat, the
> name change will have to be reverted.

Ok, I'll revert the codec2 change after I save my generics branch,
hopefully later today or this weekend.

Gary

>
>> Gary
>>
>>>
>>>> We have lang3 and digester3 under our belts now with new packages. Are
>>>> we going to change policy again? I hope not. We sure spent a lot of
>>>> time on this and thought we made a sane decision as a community.
>>>> Joda-time is its own world can do what it wants but I'd like to keep
>>>> my sanity in commons land with clear and consistent policies ;)
>>>
>>> It's not a change in policy; lang3 and digester3 are exceptions.
>>>
>>>> Wrt to removing deprecations, we can revisit each change one at a time
>>>> if someone cares to data mine svn for the age of each or whatever
>>>> metric you want.
>>>
>>> +1
>>>
>>>> Cheers to all and thank you for your time and constructive feedback :)
>>>>
>>>> Gary
>>>>
>>>> On Aug 12, 2011, at 6:31, Stephen Colebourne <sc...@joda.org> wrote:
>>>>
>>>>> On 12 August 2011 11:19, sebb <se...@gmail.com> wrote:
>>>>>>> - Removing deprecated methods does not require a package name change
>>>>>>
>>>>>> How so?
>>>>>>
>>>>>> If there are any external references to them in an application that
>>>>>> cannot be removed, then both old and new jars will need to be
>>>>>> deployed.
>>>>>> Which cannot be done safely in a single classloader (no guarantee
>>>>>> which instance of duplicated classes will be loaded).
>>>>>> AFAIK Maven prevents duplicates anyway.
>>>>>
>>>>> In Joda-Time v2.0 I removed some deprecated methods and left others in
>>>>> (no package name change). Those that I removed were methods that were
>>>>> deprecated for a very long time (probably4years+), with multiple later
>>>>> versions with the deprecation and easy alternates. Those that I did
>>>>> not remove were classes and methods that were probably still in use by
>>>>> people as they were once a primary API. This is a judgement call.
>>>>>
>>>>> And yes, removing a deprecated element means that another project that
>>>>> still uses the deprecation can no longer run. But if you've had
>>>>> something deprecated for over 3 years, that doesn't seem too harsh,
>>>>> unless it used to be a key/primary API.
>>>>>
>>>>> In hard cases, I'd rather see "NewFoo" of "Foo2" replacing "Foo"
>>>>> within the same package name, or a new sub-package within the same
>>>>> o.a.c.codec package space rather than o.a.c.codec2 for everything.
>>>>>
>>>>> Stephen
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>>
>> --
>> Thank you,
>> Gary
>>
>> http://garygregory.wordpress.com/
>> http://garygregory.com/
>> http://people.apache.org/~ggregory/
>> http://twitter.com/GaryGregory
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



-- 
Thank you,
Gary

http://garygregory.wordpress.com/
http://garygregory.com/
http://people.apache.org/~ggregory/
http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by sebb <se...@gmail.com>.
On 12 August 2011 15:29, Gary Gregory <ga...@gmail.com> wrote:
> On Fri, Aug 12, 2011 at 9:54 AM, sebb <se...@gmail.com> wrote:
>> On 12 August 2011 14:33, Gary Gregory <ga...@gmail.com> wrote:
>>> Can we proceed like so?
>>>
>>> - I'll save my generified codec in an svn branch ASAP.
>>> - we can discuss that and get the best design
>>> - is it binary compatible?
>>
>> Or can it be made binary-compatible without excessive compromises?
>
> I think we should look at the generics branch (when it is there), make
> it the best we can, and then consider what it means to binary
> compatibility and if it is worth achieving.
>
>>
>>> - if not, which is my current view, then package is codec2
>>
>> Whatever the final decision, it's much easier to keep the package name
>> as codec until just before release.
>
> trunk is codec2 ATM based on the binary incompatibility, due to
> removed deprecated methods and other changes (field made final for
> example.)

Yes, if released as binary incompat. it will have to change to codec2,
but the package change can (and IMO should) be left until the last
moment.

As it stands now, it's much harder to check for binary compat (have to
use shade before using clirr).
Also, if it turns out to be possible to maintain binary compat, the
name change will have to be reverted.

> Gary
>
>>
>>> We have lang3 and digester3 under our belts now with new packages. Are
>>> we going to change policy again? I hope not. We sure spent a lot of
>>> time on this and thought we made a sane decision as a community.
>>> Joda-time is its own world can do what it wants but I'd like to keep
>>> my sanity in commons land with clear and consistent policies ;)
>>
>> It's not a change in policy; lang3 and digester3 are exceptions.
>>
>>> Wrt to removing deprecations, we can revisit each change one at a time
>>> if someone cares to data mine svn for the age of each or whatever
>>> metric you want.
>>
>> +1
>>
>>> Cheers to all and thank you for your time and constructive feedback :)
>>>
>>> Gary
>>>
>>> On Aug 12, 2011, at 6:31, Stephen Colebourne <sc...@joda.org> wrote:
>>>
>>>> On 12 August 2011 11:19, sebb <se...@gmail.com> wrote:
>>>>>> - Removing deprecated methods does not require a package name change
>>>>>
>>>>> How so?
>>>>>
>>>>> If there are any external references to them in an application that
>>>>> cannot be removed, then both old and new jars will need to be
>>>>> deployed.
>>>>> Which cannot be done safely in a single classloader (no guarantee
>>>>> which instance of duplicated classes will be loaded).
>>>>> AFAIK Maven prevents duplicates anyway.
>>>>
>>>> In Joda-Time v2.0 I removed some deprecated methods and left others in
>>>> (no package name change). Those that I removed were methods that were
>>>> deprecated for a very long time (probably4years+), with multiple later
>>>> versions with the deprecation and easy alternates. Those that I did
>>>> not remove were classes and methods that were probably still in use by
>>>> people as they were once a primary API. This is a judgement call.
>>>>
>>>> And yes, removing a deprecated element means that another project that
>>>> still uses the deprecation can no longer run. But if you've had
>>>> something deprecated for over 3 years, that doesn't seem too harsh,
>>>> unless it used to be a key/primary API.
>>>>
>>>> In hard cases, I'd rather see "NewFoo" of "Foo2" replacing "Foo"
>>>> within the same package name, or a new sub-package within the same
>>>> o.a.c.codec package space rather than o.a.c.codec2 for everything.
>>>>
>>>> Stephen
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
>
> --
> Thank you,
> Gary
>
> http://garygregory.wordpress.com/
> http://garygregory.com/
> http://people.apache.org/~ggregory/
> http://twitter.com/GaryGregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Gary Gregory <ga...@gmail.com>.
On Fri, Aug 12, 2011 at 9:54 AM, sebb <se...@gmail.com> wrote:
> On 12 August 2011 14:33, Gary Gregory <ga...@gmail.com> wrote:
>> Can we proceed like so?
>>
>> - I'll save my generified codec in an svn branch ASAP.
>> - we can discuss that and get the best design
>> - is it binary compatible?
>
> Or can it be made binary-compatible without excessive compromises?

I think we should look at the generics branch (when it is there), make
it the best we can, and then consider what it means to binary
compatibility and if it is worth achieving.

>
>> - if not, which is my current view, then package is codec2
>
> Whatever the final decision, it's much easier to keep the package name
> as codec until just before release.

trunk is codec2 ATM based on the binary incompatibility, due to
removed deprecated methods and other changes (field made final for
example.)

Gary

>
>> We have lang3 and digester3 under our belts now with new packages. Are
>> we going to change policy again? I hope not. We sure spent a lot of
>> time on this and thought we made a sane decision as a community.
>> Joda-time is its own world can do what it wants but I'd like to keep
>> my sanity in commons land with clear and consistent policies ;)
>
> It's not a change in policy; lang3 and digester3 are exceptions.
>
>> Wrt to removing deprecations, we can revisit each change one at a time
>> if someone cares to data mine svn for the age of each or whatever
>> metric you want.
>
> +1
>
>> Cheers to all and thank you for your time and constructive feedback :)
>>
>> Gary
>>
>> On Aug 12, 2011, at 6:31, Stephen Colebourne <sc...@joda.org> wrote:
>>
>>> On 12 August 2011 11:19, sebb <se...@gmail.com> wrote:
>>>>> - Removing deprecated methods does not require a package name change
>>>>
>>>> How so?
>>>>
>>>> If there are any external references to them in an application that
>>>> cannot be removed, then both old and new jars will need to be
>>>> deployed.
>>>> Which cannot be done safely in a single classloader (no guarantee
>>>> which instance of duplicated classes will be loaded).
>>>> AFAIK Maven prevents duplicates anyway.
>>>
>>> In Joda-Time v2.0 I removed some deprecated methods and left others in
>>> (no package name change). Those that I removed were methods that were
>>> deprecated for a very long time (probably4years+), with multiple later
>>> versions with the deprecation and easy alternates. Those that I did
>>> not remove were classes and methods that were probably still in use by
>>> people as they were once a primary API. This is a judgement call.
>>>
>>> And yes, removing a deprecated element means that another project that
>>> still uses the deprecation can no longer run. But if you've had
>>> something deprecated for over 3 years, that doesn't seem too harsh,
>>> unless it used to be a key/primary API.
>>>
>>> In hard cases, I'd rather see "NewFoo" of "Foo2" replacing "Foo"
>>> within the same package name, or a new sub-package within the same
>>> o.a.c.codec package space rather than o.a.c.codec2 for everything.
>>>
>>> Stephen
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



-- 
Thank you,
Gary

http://garygregory.wordpress.com/
http://garygregory.com/
http://people.apache.org/~ggregory/
http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Stephen Colebourne <sc...@joda.org>.
On 12 August 2011 14:54, sebb <se...@gmail.com> wrote:
>> We have lang3 and digester3 under our belts now with new packages. Are
>> we going to change policy again? I hope not. We sure spent a lot of
>> time on this and thought we made a sane decision as a community.
>> Joda-time is its own world can do what it wants but I'd like to keep
>> my sanity in commons land with clear and consistent policies ;)
>
> It's not a change in policy; lang3 and digester3 are exceptions.

I think of the rule as "if there are any significant (non long
deprecated) backwards incompatibilities, then a new package name is
needed, otherwise try as hard as possible to retain the same package
name".

Stephen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by sebb <se...@gmail.com>.
On 12 August 2011 14:33, Gary Gregory <ga...@gmail.com> wrote:
> Can we proceed like so?
>
> - I'll save my generified codec in an svn branch ASAP.
> - we can discuss that and get the best design
> - is it binary compatible?

Or can it be made binary-compatible without excessive compromises?

> - if not, which is my current view, then package is codec2

Whatever the final decision, it's much easier to keep the package name
as codec until just before release.

> We have lang3 and digester3 under our belts now with new packages. Are
> we going to change policy again? I hope not. We sure spent a lot of
> time on this and thought we made a sane decision as a community.
> Joda-time is its own world can do what it wants but I'd like to keep
> my sanity in commons land with clear and consistent policies ;)

It's not a change in policy; lang3 and digester3 are exceptions.

> Wrt to removing deprecations, we can revisit each change one at a time
> if someone cares to data mine svn for the age of each or whatever
> metric you want.

+1

> Cheers to all and thank you for your time and constructive feedback :)
>
> Gary
>
> On Aug 12, 2011, at 6:31, Stephen Colebourne <sc...@joda.org> wrote:
>
>> On 12 August 2011 11:19, sebb <se...@gmail.com> wrote:
>>>> - Removing deprecated methods does not require a package name change
>>>
>>> How so?
>>>
>>> If there are any external references to them in an application that
>>> cannot be removed, then both old and new jars will need to be
>>> deployed.
>>> Which cannot be done safely in a single classloader (no guarantee
>>> which instance of duplicated classes will be loaded).
>>> AFAIK Maven prevents duplicates anyway.
>>
>> In Joda-Time v2.0 I removed some deprecated methods and left others in
>> (no package name change). Those that I removed were methods that were
>> deprecated for a very long time (probably4years+), with multiple later
>> versions with the deprecation and easy alternates. Those that I did
>> not remove were classes and methods that were probably still in use by
>> people as they were once a primary API. This is a judgement call.
>>
>> And yes, removing a deprecated element means that another project that
>> still uses the deprecation can no longer run. But if you've had
>> something deprecated for over 3 years, that doesn't seem too harsh,
>> unless it used to be a key/primary API.
>>
>> In hard cases, I'd rather see "NewFoo" of "Foo2" replacing "Foo"
>> within the same package name, or a new sub-package within the same
>> o.a.c.codec package space rather than o.a.c.codec2 for everything.
>>
>> Stephen
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Gary Gregory <ga...@gmail.com>.
Can we proceed like so?

- I'll save my generified codec in an svn branch ASAP.
- we can discuss that and get the best design
- is it binary compatible?
- if not, which is my current view, then package is codec2

We have lang3 and digester3 under our belts now with new packages. Are
we going to change policy again? I hope not. We sure spent a lot of
time on this and thought we made a sane decision as a community.
Joda-time is its own world can do what it wants but I'd like to keep
my sanity in commons land with clear and consistent policies ;)

Wrt to removing deprecations, we can revisit each change one at a time
if someone cares to data mine svn for the age of each or whatever
metric you want.

Cheers to all and thank you for your time and constructive feedback :)

Gary

On Aug 12, 2011, at 6:31, Stephen Colebourne <sc...@joda.org> wrote:

> On 12 August 2011 11:19, sebb <se...@gmail.com> wrote:
>>> - Removing deprecated methods does not require a package name change
>>
>> How so?
>>
>> If there are any external references to them in an application that
>> cannot be removed, then both old and new jars will need to be
>> deployed.
>> Which cannot be done safely in a single classloader (no guarantee
>> which instance of duplicated classes will be loaded).
>> AFAIK Maven prevents duplicates anyway.
>
> In Joda-Time v2.0 I removed some deprecated methods and left others in
> (no package name change). Those that I removed were methods that were
> deprecated for a very long time (probably4years+), with multiple later
> versions with the deprecation and easy alternates. Those that I did
> not remove were classes and methods that were probably still in use by
> people as they were once a primary API. This is a judgement call.
>
> And yes, removing a deprecated element means that another project that
> still uses the deprecation can no longer run. But if you've had
> something deprecated for over 3 years, that doesn't seem too harsh,
> unless it used to be a key/primary API.
>
> In hard cases, I'd rather see "NewFoo" of "Foo2" replacing "Foo"
> within the same package name, or a new sub-package within the same
> o.a.c.codec package space rather than o.a.c.codec2 for everything.
>
> Stephen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Stephen Colebourne <sc...@joda.org>.
On 12 August 2011 11:19, sebb <se...@gmail.com> wrote:
>> - Removing deprecated methods does not require a package name change
>
> How so?
>
> If there are any external references to them in an application that
> cannot be removed, then both old and new jars will need to be
> deployed.
> Which cannot be done safely in a single classloader (no guarantee
> which instance of duplicated classes will be loaded).
> AFAIK Maven prevents duplicates anyway.

In Joda-Time v2.0 I removed some deprecated methods and left others in
(no package name change). Those that I removed were methods that were
deprecated for a very long time (probably4years+), with multiple later
versions with the deprecation and easy alternates. Those that I did
not remove were classes and methods that were probably still in use by
people as they were once a primary API. This is a judgement call.

And yes, removing a deprecated element means that another project that
still uses the deprecation can no longer run. But if you've had
something deprecated for over 3 years, that doesn't seem too harsh,
unless it used to be a key/primary API.

In hard cases, I'd rather see "NewFoo" of "Foo2" replacing "Foo"
within the same package name, or a new sub-package within the same
o.a.c.codec package space rather than o.a.c.codec2 for everything.

Stephen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by sebb <se...@gmail.com>.
On 12 August 2011 08:37, Stephen Colebourne <sc...@joda.org> wrote:
> I've just noticed this thread.
>
> I'd like to ask those involved to consider if they can find a route
> where the package name and group do *not* change.
>
> - Changing to JDK 5 does not require a a package name change (generics
> are backward compatible if the erased signatures don't change).

+1

> - Removing deprecated methods does not require a package name change

How so?

If there are any external references to them in an application that
cannot be removed, then both old and new jars will need to be
deployed.
Which cannot be done safely in a single classloader (no guarantee
which instance of duplicated classes will be loaded).
AFAIK Maven prevents duplicates anyway.

> (and potentially, we should just leave them in)

+1, unless the deprecated methods are so old that it is guaranteed
no-one is using them.

> - It is far far more user friendly to not have a "new" codec and an
> "old" codec in their path.

+1

> Looking back at [lang], I think that a different focus might have
> allowed 90% of the code to have been compatible and in the same
> package, with 10% in a new package. [codec] might well fit that model
> too.
>
> For example, if a method needs its signature changing, create a new
> method alongside and deprecate the original. Yes, it isn't as pretty,
> but it is generally better for users. Only if there are too many cases
> like this to keep the API sane, should a "new" codec (or any other
> commons lib) be created.

+1

I originally thought NET would need a package change to fix problems
with the NNTP API that used int rather than long, but it turned out to
be possible without too much stale code.

> Stephen
>
>
> On 11 August 2011 23:44, Gary Gregory <ga...@gmail.com> wrote:
>> On Thu, Aug 11, 2011 at 4:10 PM, sebb <se...@gmail.com> wrote:
>>> On 11 August 2011 20:56, Gary Gregory <ga...@gmail.com> wrote:
>>>> Hello All!
>>>>
>>>> Topic 1: Housekeeping: package name and POM.
>>>>
>>>> The next codec release out of trunk will be major release labeled 2.0,
>>>> the current release is 1.5.
>>>>
>>>> In trunk, I've removed deprecated methods and the project now requires
>>>> Java 5. This means 2.0 will not be a drop-in binary compatible release
>>>> for 1.5.
>>>>
>>>> I'd like to confirm or deny that this means the package name will
>>>> change to o.a.c.codec2 and that the POM groupId will have to change
>>>> from commons-codec to org.apache.commons. 2.0 and 1.5 would be able to
>>>> live side by side.
>>>
>>> Yes, the name changes are necessary to avoid problems with incompatible jars.
>>>
>>>> I'd like to get this out of the way first hence topic 1.
>>>>
>>>>
>>>> Topic 2: Beider-Morse (BM) Encoder API
>>>> https://issues.apache.org/jira/browse/CODEC-125
>>>>
>>>> BM is a new codec for 2.0.
>>>>
>>>> The encode API returns a set of encodings.
>>>>
>>>> In trunk, this is currently a String in the format "s1|s2|s3".
>>>>
>>>> I think this is not the best design, a set should be a Set, in this
>>>> case, an ordered set. Or, a List. Generally, it should be a Collection
>>>> of Strings.
>>>>
>>>> There was concern with call sites that generically use a [codec]
>>>> Encoder with the signature "Object encoder(Object)" and call
>>>> toString() on the result.
>>>>
>>>> If we set the API to "CharSequence encode(Set<CharSequence>)" or
>>>> "String encode(Set<String>)", doing a toString() on a HashSet will
>>>> yield a usable String similar as to what trunk does now. For example,
>>>> for a HashSet of Strings "a", "b" and "c", HashSet.toString() returns
>>>> "[a, b, c]" which no worse than "a|b|c" IMO. At least it is a
>>>> documented and stable format.
>>>
>>> +1
>>>
>>>> Topic 3: Generics
>>>>
>>>> This will be in a separate thread but I'd like to get this in 2.0
>>>> because this will likely break the API and I only want to break things
>>>> once and not have to do a codec3 for generics.
>>>
>>> +1.
>>
>> I'll work on a generified codec2 over the next couple of days and
>> present what that looks like, maybe in a branch, or a patch.
>>
>> Gary
>>
>>>
>>>> Thank you all,
>>>> Gary
>>>>
>>>> On Thu, Aug 11, 2011 at 2:38 PM, Matthew Pocock
>>>> <tu...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> As those of you who've been following the CODEC-125 ticket will know, with
>>>>> Greg's help I've got a port of the beider morse phonetic
>>>>> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
>>>>> ready for people to use and abuse. It ideally needs more test-case words,
>>>>> but to the best of my knowledge it doesn't have any horrendous bugs or
>>>>> performance issues.
>>>>>
>>>>> The discussion on the ticket started to stray off bmpm and on to policy for
>>>>> releases and changing APIs, and Sebb said we should discuss it on the list.
>>>>> So, here we are.
>>>>>
>>>>> Ideally, I'd like there to be a release of commons-codec some time soon so
>>>>> that users can start to try out bmpm right away, and so that we can start
>>>>> the process of adding it to the list of supported indexing methods in solr.
>>>>> What do people think?
>>>>>
>>>>> Matthew
>>>>>
>>>>> --
>>>>> Dr Matthew Pocock
>>>>> Visitor, School of Computing Science, Newcastle University
>>>>> mailto: turingatemyhamster@gmail.com
>>>>> gchat: turingatemyhamster@gmail.com
>>>>> msn: matthew_pocock@yahoo.co.uk
>>>>> irc.freenode.net: drdozer
>>>>> tel: (0191) 2566550
>>>>> mob: +447535664143
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thank you,
>>>> Gary
>>>>
>>>> http://garygregory.wordpress.com/
>>>> http://garygregory.com/
>>>> http://people.apache.org/~ggregory/
>>>> http://twitter.com/GaryGregory
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>>
>> --
>> Thank you,
>> Gary
>>
>> http://garygregory.wordpress.com/
>> http://garygregory.com/
>> http://people.apache.org/~ggregory/
>> http://twitter.com/GaryGregory
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Stephen Colebourne <sc...@joda.org>.
I've just noticed this thread.

I'd like to ask those involved to consider if they can find a route
where the package name and group do *not* change.

- Changing to JDK 5 does not require a a package name change (generics
are backward compatible if the erased signatures don't change).
- Removing deprecated methods does not require a package name change
(and potentially, we should just leave them in)
- It is far far more user friendly to not have a "new" codec and an
"old" codec in their path.

Looking back at [lang], I think that a different focus might have
allowed 90% of the code to have been compatible and in the same
package, with 10% in a new package. [codec] might well fit that model
too.

For example, if a method needs its signature changing, create a new
method alongside and deprecate the original. Yes, it isn't as pretty,
but it is generally better for users. Only if there are too many cases
like this to keep the API sane, should a "new" codec (or any other
commons lib) be created.

Stephen


On 11 August 2011 23:44, Gary Gregory <ga...@gmail.com> wrote:
> On Thu, Aug 11, 2011 at 4:10 PM, sebb <se...@gmail.com> wrote:
>> On 11 August 2011 20:56, Gary Gregory <ga...@gmail.com> wrote:
>>> Hello All!
>>>
>>> Topic 1: Housekeeping: package name and POM.
>>>
>>> The next codec release out of trunk will be major release labeled 2.0,
>>> the current release is 1.5.
>>>
>>> In trunk, I've removed deprecated methods and the project now requires
>>> Java 5. This means 2.0 will not be a drop-in binary compatible release
>>> for 1.5.
>>>
>>> I'd like to confirm or deny that this means the package name will
>>> change to o.a.c.codec2 and that the POM groupId will have to change
>>> from commons-codec to org.apache.commons. 2.0 and 1.5 would be able to
>>> live side by side.
>>
>> Yes, the name changes are necessary to avoid problems with incompatible jars.
>>
>>> I'd like to get this out of the way first hence topic 1.
>>>
>>>
>>> Topic 2: Beider-Morse (BM) Encoder API
>>> https://issues.apache.org/jira/browse/CODEC-125
>>>
>>> BM is a new codec for 2.0.
>>>
>>> The encode API returns a set of encodings.
>>>
>>> In trunk, this is currently a String in the format "s1|s2|s3".
>>>
>>> I think this is not the best design, a set should be a Set, in this
>>> case, an ordered set. Or, a List. Generally, it should be a Collection
>>> of Strings.
>>>
>>> There was concern with call sites that generically use a [codec]
>>> Encoder with the signature "Object encoder(Object)" and call
>>> toString() on the result.
>>>
>>> If we set the API to "CharSequence encode(Set<CharSequence>)" or
>>> "String encode(Set<String>)", doing a toString() on a HashSet will
>>> yield a usable String similar as to what trunk does now. For example,
>>> for a HashSet of Strings "a", "b" and "c", HashSet.toString() returns
>>> "[a, b, c]" which no worse than "a|b|c" IMO. At least it is a
>>> documented and stable format.
>>
>> +1
>>
>>> Topic 3: Generics
>>>
>>> This will be in a separate thread but I'd like to get this in 2.0
>>> because this will likely break the API and I only want to break things
>>> once and not have to do a codec3 for generics.
>>
>> +1.
>
> I'll work on a generified codec2 over the next couple of days and
> present what that looks like, maybe in a branch, or a patch.
>
> Gary
>
>>
>>> Thank you all,
>>> Gary
>>>
>>> On Thu, Aug 11, 2011 at 2:38 PM, Matthew Pocock
>>> <tu...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> As those of you who've been following the CODEC-125 ticket will know, with
>>>> Greg's help I've got a port of the beider morse phonetic
>>>> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
>>>> ready for people to use and abuse. It ideally needs more test-case words,
>>>> but to the best of my knowledge it doesn't have any horrendous bugs or
>>>> performance issues.
>>>>
>>>> The discussion on the ticket started to stray off bmpm and on to policy for
>>>> releases and changing APIs, and Sebb said we should discuss it on the list.
>>>> So, here we are.
>>>>
>>>> Ideally, I'd like there to be a release of commons-codec some time soon so
>>>> that users can start to try out bmpm right away, and so that we can start
>>>> the process of adding it to the list of supported indexing methods in solr.
>>>> What do people think?
>>>>
>>>> Matthew
>>>>
>>>> --
>>>> Dr Matthew Pocock
>>>> Visitor, School of Computing Science, Newcastle University
>>>> mailto: turingatemyhamster@gmail.com
>>>> gchat: turingatemyhamster@gmail.com
>>>> msn: matthew_pocock@yahoo.co.uk
>>>> irc.freenode.net: drdozer
>>>> tel: (0191) 2566550
>>>> mob: +447535664143
>>>>
>>>
>>>
>>>
>>> --
>>> Thank you,
>>> Gary
>>>
>>> http://garygregory.wordpress.com/
>>> http://garygregory.com/
>>> http://people.apache.org/~ggregory/
>>> http://twitter.com/GaryGregory
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
>
> --
> Thank you,
> Gary
>
> http://garygregory.wordpress.com/
> http://garygregory.com/
> http://people.apache.org/~ggregory/
> http://twitter.com/GaryGregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Gary Gregory <ga...@gmail.com>.
On Thu, Aug 11, 2011 at 4:10 PM, sebb <se...@gmail.com> wrote:
> On 11 August 2011 20:56, Gary Gregory <ga...@gmail.com> wrote:
>> Hello All!
>>
>> Topic 1: Housekeeping: package name and POM.
>>
>> The next codec release out of trunk will be major release labeled 2.0,
>> the current release is 1.5.
>>
>> In trunk, I've removed deprecated methods and the project now requires
>> Java 5. This means 2.0 will not be a drop-in binary compatible release
>> for 1.5.
>>
>> I'd like to confirm or deny that this means the package name will
>> change to o.a.c.codec2 and that the POM groupId will have to change
>> from commons-codec to org.apache.commons. 2.0 and 1.5 would be able to
>> live side by side.
>
> Yes, the name changes are necessary to avoid problems with incompatible jars.
>
>> I'd like to get this out of the way first hence topic 1.
>>
>>
>> Topic 2: Beider-Morse (BM) Encoder API
>> https://issues.apache.org/jira/browse/CODEC-125
>>
>> BM is a new codec for 2.0.
>>
>> The encode API returns a set of encodings.
>>
>> In trunk, this is currently a String in the format "s1|s2|s3".
>>
>> I think this is not the best design, a set should be a Set, in this
>> case, an ordered set. Or, a List. Generally, it should be a Collection
>> of Strings.
>>
>> There was concern with call sites that generically use a [codec]
>> Encoder with the signature "Object encoder(Object)" and call
>> toString() on the result.
>>
>> If we set the API to "CharSequence encode(Set<CharSequence>)" or
>> "String encode(Set<String>)", doing a toString() on a HashSet will
>> yield a usable String similar as to what trunk does now. For example,
>> for a HashSet of Strings "a", "b" and "c", HashSet.toString() returns
>> "[a, b, c]" which no worse than "a|b|c" IMO. At least it is a
>> documented and stable format.
>
> +1
>
>> Topic 3: Generics
>>
>> This will be in a separate thread but I'd like to get this in 2.0
>> because this will likely break the API and I only want to break things
>> once and not have to do a codec3 for generics.
>
> +1.

I'll work on a generified codec2 over the next couple of days and
present what that looks like, maybe in a branch, or a patch.

Gary

>
>> Thank you all,
>> Gary
>>
>> On Thu, Aug 11, 2011 at 2:38 PM, Matthew Pocock
>> <tu...@gmail.com> wrote:
>>> Hi,
>>>
>>> As those of you who've been following the CODEC-125 ticket will know, with
>>> Greg's help I've got a port of the beider morse phonetic
>>> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
>>> ready for people to use and abuse. It ideally needs more test-case words,
>>> but to the best of my knowledge it doesn't have any horrendous bugs or
>>> performance issues.
>>>
>>> The discussion on the ticket started to stray off bmpm and on to policy for
>>> releases and changing APIs, and Sebb said we should discuss it on the list.
>>> So, here we are.
>>>
>>> Ideally, I'd like there to be a release of commons-codec some time soon so
>>> that users can start to try out bmpm right away, and so that we can start
>>> the process of adding it to the list of supported indexing methods in solr.
>>> What do people think?
>>>
>>> Matthew
>>>
>>> --
>>> Dr Matthew Pocock
>>> Visitor, School of Computing Science, Newcastle University
>>> mailto: turingatemyhamster@gmail.com
>>> gchat: turingatemyhamster@gmail.com
>>> msn: matthew_pocock@yahoo.co.uk
>>> irc.freenode.net: drdozer
>>> tel: (0191) 2566550
>>> mob: +447535664143
>>>
>>
>>
>>
>> --
>> Thank you,
>> Gary
>>
>> http://garygregory.wordpress.com/
>> http://garygregory.com/
>> http://people.apache.org/~ggregory/
>> http://twitter.com/GaryGregory
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



-- 
Thank you,
Gary

http://garygregory.wordpress.com/
http://garygregory.com/
http://people.apache.org/~ggregory/
http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by sebb <se...@gmail.com>.
On 11 August 2011 20:56, Gary Gregory <ga...@gmail.com> wrote:
> Hello All!
>
> Topic 1: Housekeeping: package name and POM.
>
> The next codec release out of trunk will be major release labeled 2.0,
> the current release is 1.5.
>
> In trunk, I've removed deprecated methods and the project now requires
> Java 5. This means 2.0 will not be a drop-in binary compatible release
> for 1.5.
>
> I'd like to confirm or deny that this means the package name will
> change to o.a.c.codec2 and that the POM groupId will have to change
> from commons-codec to org.apache.commons. 2.0 and 1.5 would be able to
> live side by side.

Yes, the name changes are necessary to avoid problems with incompatible jars.

> I'd like to get this out of the way first hence topic 1.
>
>
> Topic 2: Beider-Morse (BM) Encoder API
> https://issues.apache.org/jira/browse/CODEC-125
>
> BM is a new codec for 2.0.
>
> The encode API returns a set of encodings.
>
> In trunk, this is currently a String in the format "s1|s2|s3".
>
> I think this is not the best design, a set should be a Set, in this
> case, an ordered set. Or, a List. Generally, it should be a Collection
> of Strings.
>
> There was concern with call sites that generically use a [codec]
> Encoder with the signature "Object encoder(Object)" and call
> toString() on the result.
>
> If we set the API to "CharSequence encode(Set<CharSequence>)" or
> "String encode(Set<String>)", doing a toString() on a HashSet will
> yield a usable String similar as to what trunk does now. For example,
> for a HashSet of Strings "a", "b" and "c", HashSet.toString() returns
> "[a, b, c]" which no worse than "a|b|c" IMO. At least it is a
> documented and stable format.

+1

> Topic 3: Generics
>
> This will be in a separate thread but I'd like to get this in 2.0
> because this will likely break the API and I only want to break things
> once and not have to do a codec3 for generics.

+1.

> Thank you all,
> Gary
>
> On Thu, Aug 11, 2011 at 2:38 PM, Matthew Pocock
> <tu...@gmail.com> wrote:
>> Hi,
>>
>> As those of you who've been following the CODEC-125 ticket will know, with
>> Greg's help I've got a port of the beider morse phonetic
>> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
>> ready for people to use and abuse. It ideally needs more test-case words,
>> but to the best of my knowledge it doesn't have any horrendous bugs or
>> performance issues.
>>
>> The discussion on the ticket started to stray off bmpm and on to policy for
>> releases and changing APIs, and Sebb said we should discuss it on the list.
>> So, here we are.
>>
>> Ideally, I'd like there to be a release of commons-codec some time soon so
>> that users can start to try out bmpm right away, and so that we can start
>> the process of adding it to the list of supported indexing methods in solr.
>> What do people think?
>>
>> Matthew
>>
>> --
>> Dr Matthew Pocock
>> Visitor, School of Computing Science, Newcastle University
>> mailto: turingatemyhamster@gmail.com
>> gchat: turingatemyhamster@gmail.com
>> msn: matthew_pocock@yahoo.co.uk
>> irc.freenode.net: drdozer
>> tel: (0191) 2566550
>> mob: +447535664143
>>
>
>
>
> --
> Thank you,
> Gary
>
> http://garygregory.wordpress.com/
> http://garygregory.com/
> http://people.apache.org/~ggregory/
> http://twitter.com/GaryGregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Gary Gregory <ga...@gmail.com>.
Hello All!

Topic 1: Housekeeping: package name and POM.

The next codec release out of trunk will be major release labeled 2.0,
the current release is 1.5.

In trunk, I've removed deprecated methods and the project now requires
Java 5. This means 2.0 will not be a drop-in binary compatible release
for 1.5.

I'd like to confirm or deny that this means the package name will
change to o.a.c.codec2 and that the POM groupId will have to change
from commons-codec to org.apache.commons. 2.0 and 1.5 would be able to
live side by side.

I'd like to get this out of the way first hence topic 1.


Topic 2: Beider-Morse (BM) Encoder API
https://issues.apache.org/jira/browse/CODEC-125

BM is a new codec for 2.0.

The encode API returns a set of encodings.

In trunk, this is currently a String in the format "s1|s2|s3".

I think this is not the best design, a set should be a Set, in this
case, an ordered set. Or, a List. Generally, it should be a Collection
of Strings.

There was concern with call sites that generically use a [codec]
Encoder with the signature "Object encoder(Object)" and call
toString() on the result.

If we set the API to "CharSequence encode(Set<CharSequence>)" or
"String encode(Set<String>)", doing a toString() on a HashSet will
yield a usable String similar as to what trunk does now. For example,
for a HashSet of Strings "a", "b" and "c", HashSet.toString() returns
"[a, b, c]" which no worse than "a|b|c" IMO. At least it is a
documented and stable format.


Topic 3: Generics

This will be in a separate thread but I'd like to get this in 2.0
because this will likely break the API and I only want to break things
once and not have to do a codec3 for generics.

Thank you all,
Gary

On Thu, Aug 11, 2011 at 2:38 PM, Matthew Pocock
<tu...@gmail.com> wrote:
> Hi,
>
> As those of you who've been following the CODEC-125 ticket will know, with
> Greg's help I've got a port of the beider morse phonetic
> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
> ready for people to use and abuse. It ideally needs more test-case words,
> but to the best of my knowledge it doesn't have any horrendous bugs or
> performance issues.
>
> The discussion on the ticket started to stray off bmpm and on to policy for
> releases and changing APIs, and Sebb said we should discuss it on the list.
> So, here we are.
>
> Ideally, I'd like there to be a release of commons-codec some time soon so
> that users can start to try out bmpm right away, and so that we can start
> the process of adding it to the list of supported indexing methods in solr.
> What do people think?
>
> Matthew
>
> --
> Dr Matthew Pocock
> Visitor, School of Computing Science, Newcastle University
> mailto: turingatemyhamster@gmail.com
> gchat: turingatemyhamster@gmail.com
> msn: matthew_pocock@yahoo.co.uk
> irc.freenode.net: drdozer
> tel: (0191) 2566550
> mob: +447535664143
>



-- 
Thank you,
Gary

http://garygregory.wordpress.com/
http://garygregory.com/
http://people.apache.org/~ggregory/
http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Henri Yandell <fl...@gmail.com>.
On Thu, Aug 11, 2011 at 12:33 PM, sebb <se...@gmail.com> wrote:
> On 11 August 2011 19:38, Matthew Pocock <tu...@gmail.com> wrote:
>> Hi,
>>
>> As those of you who've been following the CODEC-125 ticket will know, with
>> Greg's help I've got a port of the beider morse phonetic
>> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
>> ready for people to use and abuse. It ideally needs more test-case words,
>> but to the best of my knowledge it doesn't have any horrendous bugs or
>> performance issues.
>>
>> The discussion on the ticket started to stray off bmpm and on to policy for
>> releases and changing APIs, and Sebb said we should discuss it on the list.
>> So, here we are.
>>
>> Ideally, I'd like there to be a release of commons-codec some time soon so
>> that users can start to try out bmpm right away, and so that we can start
>> the process of adding it to the list of supported indexing methods in solr.
>> What do people think?
>
> The reason I raised the issue was that the API seems to be currently
> in a state of flux.
>
> Most Commons components strive for binary compatibility between releases.
> Where this cannot be achieved, normally this requires a change of
> package name and Maven id (as well as major version bump).
> This is to avoid the jar hell that can occur where two or more other
> pieces of code require different versions of the API, and where it's
> not possible to update all references to the changed code at once.
>
> In this case, because the BMPM code is new, it might be possible to
> relax the requirement somewhat, so long as the code API is documented
> as being unstable.
>
> If we do have to change BMPM in a way that is not binary compatible,
> then all code that uses the BMPM classes will need to be updated.
> This should be much less of an issue than if (say) the Base64 classes
> were changed, as there should not be many external classes that use
> BMPM in a given application.

My preference is to put unstable code in an unstable package name.

org.apache.commons.codec.unstable.encoders etc.

Very clear and we can later quite happily move it to the right package
name with a clear conscience.

Hen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by sebb <se...@gmail.com>.
On 11 August 2011 20:55, Matthew Pocock <tu...@gmail.com> wrote:
> Hi Sebb,
>
>
>> The reason I raised the issue was that the API seems to be currently
>> in a state of flux.
>>
>
> The BMPM code has not appeared in a previous release. It is a discrete
> addition that doesn't alter any existing code, and as far as I know,
> currently no 3rd party code relies upon it. Right now on trunk, it is a
> StringEncoder.

OK

>
>> In this case, because the BMPM code is new, it might be possible to
>> relax the requirement somewhat, so long as the code API is documented
>> as being unstable.
>>
>
> I've no problem with marking it as new or unstable or whatever the right
> word is. While it extends StringEncoder, the API is stable. Although there
> may be more flux with the finer details of the string you get out for the
> string you put in as we fix bugs and update the rule tables, this shouldn't
> alter how clients (users of the API) call this code, only the quality of the
> results they get back.

OK, that won't affect binary compat.

>
>>
>> If we do have to change BMPM in a way that is not binary compatible,
>> then all code that uses the BMPM classes will need to be updated.
>>
>
> Understood. I think this only becomes an issue if/when Encoder becomes
> generified, and at that point clearly we need a big version bump, with all
> the associated changes, and all encoders and their clients would be equally
> affected.

Indeed.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [codec] getting the bmpm code out there

Posted by Matthew Pocock <tu...@gmail.com>.
Hi Sebb,


> The reason I raised the issue was that the API seems to be currently
> in a state of flux.
>

The BMPM code has not appeared in a previous release. It is a discrete
addition that doesn't alter any existing code, and as far as I know,
currently no 3rd party code relies upon it. Right now on trunk, it is a
StringEncoder.


> In this case, because the BMPM code is new, it might be possible to
> relax the requirement somewhat, so long as the code API is documented
> as being unstable.
>

I've no problem with marking it as new or unstable or whatever the right
word is. While it extends StringEncoder, the API is stable. Although there
may be more flux with the finer details of the string you get out for the
string you put in as we fix bugs and update the rule tables, this shouldn't
alter how clients (users of the API) call this code, only the quality of the
results they get back.


>
> If we do have to change BMPM in a way that is not binary compatible,
> then all code that uses the BMPM classes will need to be updated.
>

Understood. I think this only becomes an issue if/when Encoder becomes
generified, and at that point clearly we need a big version bump, with all
the associated changes, and all encoders and their clients would be equally
affected.

Does that help, or have I further muddied the waters?

Matthew

-- 
Dr Matthew Pocock
Visitor, School of Computing Science, Newcastle University
mailto: turingatemyhamster@gmail.com
gchat: turingatemyhamster@gmail.com
msn: matthew_pocock@yahoo.co.uk
irc.freenode.net: drdozer
tel: (0191) 2566550
mob: +447535664143

Re: [codec] getting the bmpm code out there

Posted by sebb <se...@gmail.com>.
On 11 August 2011 19:38, Matthew Pocock <tu...@gmail.com> wrote:
> Hi,
>
> As those of you who've been following the CODEC-125 ticket will know, with
> Greg's help I've got a port of the beider morse phonetic
> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
> ready for people to use and abuse. It ideally needs more test-case words,
> but to the best of my knowledge it doesn't have any horrendous bugs or
> performance issues.
>
> The discussion on the ticket started to stray off bmpm and on to policy for
> releases and changing APIs, and Sebb said we should discuss it on the list.
> So, here we are.
>
> Ideally, I'd like there to be a release of commons-codec some time soon so
> that users can start to try out bmpm right away, and so that we can start
> the process of adding it to the list of supported indexing methods in solr.
> What do people think?

The reason I raised the issue was that the API seems to be currently
in a state of flux.

Most Commons components strive for binary compatibility between releases.
Where this cannot be achieved, normally this requires a change of
package name and Maven id (as well as major version bump).
This is to avoid the jar hell that can occur where two or more other
pieces of code require different versions of the API, and where it's
not possible to update all references to the changed code at once.

In this case, because the BMPM code is new, it might be possible to
relax the requirement somewhat, so long as the code API is documented
as being unstable.

If we do have to change BMPM in a way that is not binary compatible,
then all code that uses the BMPM classes will need to be updated.
This should be much less of an issue than if (say) the Base64 classes
were changed, as there should not be many external classes that use
BMPM in a given application.


> Matthew
>
> --
> Dr Matthew Pocock
> Visitor, School of Computing Science, Newcastle University
> mailto: turingatemyhamster@gmail.com
> gchat: turingatemyhamster@gmail.com
> msn: matthew_pocock@yahoo.co.uk
> irc.freenode.net: drdozer
> tel: (0191) 2566550
> mob: +447535664143
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org