You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@commons.apache.org by tn...@apache.org on 2012/03/08 22:19:56 UTC

svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Author: tn
Date: Thu Mar  8 21:19:56 2012
New Revision: 1298588

URL: http://svn.apache.org/viewvc?rev=1298588&view=rev
Log:
fixed spelling in NysiisTest

Modified:
    commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Modified: commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java
URL: http://svn.apache.org/viewvc/commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java?rev=1298588&r1=1298587&r2=1298588&view=diff
==============================================================================
--- commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java (original)
+++ commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java Thu Mar  8 21:19:56 2012
@@ -137,29 +137,29 @@ public class NysiisTest extends StringEn
         // Algorithm (taken from www.dropby.com/NYSIIS.html):
         //
         // 1.  Transcode first characters of name:
-        //    MAC »   MCC
-        //    KN  »   NN
-        //    K   »   C
-        //    PH  »   FF
-        //    PF  »   FF
-        //    SCH »   SSS
+        //    MAC >   MCC
+        //    KN  >   NN
+        //    K   >   C
+        //    PH  >   FF
+        //    PF  >   FF
+        //    SCH >   SSS
         //
         // 2.  Transcode last characters of name:
-        //    EE, IE  »   Y
-        //    DT,RT,RD,NT,ND  »   D
+        //    EE, IE  >   Y
+        //    DT,RT,RD,NT,ND  >   D
         //
         // 3.  First character of key = first character of name.
         //
         // 4.  Transcode remaining characters by following these rules, incrementing by one character each time:
-        //   4a.   EV  »   AF  else A,E,I,O,U » A
-        //   4b.   Q   »   G
-        //   4c.   Z   »   S
-        //   4d.   M   »   N
-        //   4e.   KN  »   N   else K » C
-        //   4f.   SCH     »   SSS
-        //   4g.   PH  »   FF
-        //   4h.   H   »   If previous or next is nonvowel, previous
-        //   4i.   W   »   If previous is vowel, previous
+        //   4a.   EV  >   AF  else A,E,I,O,U > A
+        //   4b.   Q   >   G
+        //   4c.   Z   >   S
+        //   4d.   M   >   N
+        //   4e.   KN  >   N   else K > C
+        //   4f.   SCH >   SSS
+        //   4g.   PH  >   FF
+        //   4h.   H   >   If previous or next is nonvowel, previous
+        //   4i.   W   >   If previous is vowel, previous
         //   4j.   Add current to key if current != last key character
         //
         // 5.  If last character is S, remove it
@@ -186,7 +186,7 @@ public class NysiisTest extends StringEn
                         new String[] { "PHILLIPSON", "FALAPSAN" }, // Original: FFALAP[SAN]
                         // violates 4j: see also KNUTH
                         new String[] { "PFEISTER", "FASTAR" }, // Original: FFASTA[R]
-                        // violoates 4j: see also KNUTH
+                        // violates 4j: see also KNUTH
                         new String[] { "SCHOENHOEFT", "SANAFT" }, // Original: SSANAF[T]
                         // http://www.dropby.com/indexLF.html?content=/NYSIIS.html
                         // 2.Transcode last characters of name: 
@@ -213,7 +213,7 @@ public class NysiisTest extends StringEn
                         // violates 4h: the H should be transcoded to S and thus ignored as
                         // the first key character is also S
                         new String[] { "SHRIVER", "SRAVAR" }, // Original: SHRAVA[R]
-                        // same as KOEHN, the L gets mysteriously lost, the correct one
+                        // same as KOEHN, the L gets mysteriously lost
                         new String[] { "KUHL", "CAL" }, // Original: C
                         new String[] { "RAWSON", "RASAN" },
                         // If last character is S, remove it



Re: svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Posted by Gary Gregory <ga...@gmail.com>.
On Fri, Mar 9, 2012 at 2:06 PM, Thomas Neidhart
<th...@gmail.com>wrote:

> On 03/09/2012 07:58 PM, Gary Gregory wrote:
>
> > That's not a great name either :(
> >
> > Wikipedia says: "If longer than 6 characters, truncate to first 6
> > characters. (only needed for true NYSIIS, some versions use the full
> key)"
> >
> > What it does is "truncate to 6" but that is a pretty functional
> > description. I think a name that reflects "strict vs. not" or "normal vs.
> > long key" would be better.
> >
> > Should the default be to truncate to 6?
> >
> > Some choices:
> >
> > - strict is best IMO
> > - fullKey
> > - shortKey
>
> I also prefer strict. So I will make the change and complete the missing
> javadoc.
>

Great, thank you.

Gary


>
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Posted by Thomas Neidhart <th...@gmail.com>.
On 03/09/2012 07:58 PM, Gary Gregory wrote:

> That's not a great name either :(
> 
> Wikipedia says: "If longer than 6 characters, truncate to first 6
> characters. (only needed for true NYSIIS, some versions use the full key)"
> 
> What it does is "truncate to 6" but that is a pretty functional
> description. I think a name that reflects "strict vs. not" or "normal vs.
> long key" would be better.
> 
> Should the default be to truncate to 6?
> 
> Some choices:
> 
> - strict is best IMO
> - fullKey
> - shortKey

I also prefer strict. So I will make the change and complete the missing
javadoc.

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Posted by Gary Gregory <ga...@gmail.com>.
On Fri, Mar 9, 2012 at 1:52 PM, Thomas Neidhart
<th...@gmail.com>wrote:

> On 03/09/2012 12:55 AM, Gary Gregory wrote:
>
> >>> This is why I added the 'trueLength' (lame name?) ivar because that
> seems
> >>> like a reasonable toggle after reading the Wikipedia entry.
>
> I changed it locally to 'cutOff', would you agree with this name change?
>

That's not a great name either :(

Wikipedia says: "If longer than 6 characters, truncate to first 6
characters. (only needed for true NYSIIS, some versions use the full key)"

What it does is "truncate to 6" but that is a pretty functional
description. I think a name that reflects "strict vs. not" or "normal vs.
long key" would be better.

Should the default be to truncate to 6?

Some choices:

- strict is best IMO
- fullKey
- shortKey

Gary


>
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Posted by Thomas Neidhart <th...@gmail.com>.
On 03/09/2012 12:55 AM, Gary Gregory wrote:

>>> This is why I added the 'trueLength' (lame name?) ivar because that seems
>>> like a reasonable toggle after reading the Wikipedia entry.

I changed it locally to 'cutOff', would you agree with this name change?

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Posted by Gary Gregory <ga...@gmail.com>.
On Mar 8, 2012, at 18:04, Thomas Neidhart <th...@gmail.com> wrote:

> On 03/08/2012 11:48 PM, Gary Gregory wrote:
>
> [snip]
>
>> Would you be willing to handle this merge and describing the algorithm in
>> the Nysiis class itself. The description in the test method feels out of
>> place.
>>
>> After that, I think I'll put a message out on the ML and ask for further
>> testing and feedback.
>
> yes, I will do it, tomorrow ;-)

Great. See you tomorrow.

Gary

>
>>> I have not found the original paper, which is a pity, and all the
>>> algorithm descriptions I have found so far vary a bit. But in the end,
>>> it's a phonetic code to match similar names and when I compare to
>>> dropby, I feel more comfortable with our implementation (e.g. take PHIL
>>> and FIL which result in FFAL and FAL in dropby, which is weird).
>>>
>>> Anyway, the modified version seems to address some of these things, so
>>> it may be a good idea to additionally implement this one.
>>>
>>
>> What do you mean?
>>
>> IMO, we should have one impl that is documented. If it deviates from the
>> 'standard', then we should document that.
>
> that's fine for me, I just mentioned the modified variant, as Henri was
> referring to it in a comment to the issue.
>
>> This is why I added the 'trueLength' (lame name?) ivar because that seems
>> like a reasonable toggle after reading the Wikipedia entry.
>
> I wanted to add this myself, but you were quicker ;-)
>
> Good night,
>
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Posted by Thomas Neidhart <th...@gmail.com>.
On 03/08/2012 11:48 PM, Gary Gregory wrote:

[snip]

> Would you be willing to handle this merge and describing the algorithm in
> the Nysiis class itself. The description in the test method feels out of
> place.
> 
> After that, I think I'll put a message out on the ML and ask for further
> testing and feedback.

yes, I will do it, tomorrow ;-)

>> I have not found the original paper, which is a pity, and all the
>> algorithm descriptions I have found so far vary a bit. But in the end,
>> it's a phonetic code to match similar names and when I compare to
>> dropby, I feel more comfortable with our implementation (e.g. take PHIL
>> and FIL which result in FFAL and FAL in dropby, which is weird).
>>
>> Anyway, the modified version seems to address some of these things, so
>> it may be a good idea to additionally implement this one.
>>
> 
> What do you mean?
> 
> IMO, we should have one impl that is documented. If it deviates from the
> 'standard', then we should document that.

that's fine for me, I just mentioned the modified variant, as Henri was
referring to it in a comment to the issue.

> This is why I added the 'trueLength' (lame name?) ivar because that seems
> like a reasonable toggle after reading the Wikipedia entry.

I wanted to add this myself, but you were quicker ;-)

Good night,

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Posted by Gary Gregory <ga...@gmail.com>.
On Thu, Mar 8, 2012 at 5:26 PM, Thomas Neidhart
<th...@gmail.com>wrote:

> On 03/08/2012 11:03 PM, Gary Gregory wrote:
> > Thomas:
> >
> > It seems to me that we do not need both testDropBy and testDropBy2.
> >
> > I initially created testDropBy2 as a way to work through the "Original"
> and
> > "Modified" examples from the site.
> >
> > So unless you think we need both, let get rid of testDropBy.
> >
> > Thoughts?
>
> yes, definitely, the two should be merged. I worked through all
> deviations, and I think our implementation is correct (wrt the algorithm
> description, which may be wrong too).
>

Would you be willing to handle this merge and describing the algorithm in
the Nysiis class itself. The description in the test method feels out of
place.

After that, I think I'll put a message out on the ML and ask for further
testing and feedback.


>
> I have not found the original paper, which is a pity, and all the
> algorithm descriptions I have found so far vary a bit. But in the end,
> it's a phonetic code to match similar names and when I compare to
> dropby, I feel more comfortable with our implementation (e.g. take PHIL
> and FIL which result in FFAL and FAL in dropby, which is weird).
>
> Anyway, the modified version seems to address some of these things, so
> it may be a good idea to additionally implement this one.
>

What do you mean?

IMO, we should have one impl that is documented. If it deviates from the
'standard', then we should document that.

This is why I added the 'trueLength' (lame name?) ivar because that seems
like a reasonable toggle after reading the Wikipedia entry.

Gary

>
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Posted by Thomas Neidhart <th...@gmail.com>.
On 03/08/2012 11:03 PM, Gary Gregory wrote:
> Thomas:
> 
> It seems to me that we do not need both testDropBy and testDropBy2.
> 
> I initially created testDropBy2 as a way to work through the "Original" and
> "Modified" examples from the site.
> 
> So unless you think we need both, let get rid of testDropBy.
> 
> Thoughts?

yes, definitely, the two should be merged. I worked through all
deviations, and I think our implementation is correct (wrt the algorithm
description, which may be wrong too).

I have not found the original paper, which is a pity, and all the
algorithm descriptions I have found so far vary a bit. But in the end,
it's a phonetic code to match similar names and when I compare to
dropby, I feel more comfortable with our implementation (e.g. take PHIL
and FIL which result in FFAL and FAL in dropby, which is weird).

Anyway, the modified version seems to address some of these things, so
it may be a good idea to additionally implement this one.

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: svn commit: r1298588 - /commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java

Posted by Gary Gregory <ga...@gmail.com>.
Thomas:

It seems to me that we do not need both testDropBy and testDropBy2.

I initially created testDropBy2 as a way to work through the "Original" and
"Modified" examples from the site.

So unless you think we need both, let get rid of testDropBy.

Thoughts?

Gary

On Thu, Mar 8, 2012 at 4:19 PM, <tn...@apache.org> wrote:

> Author: tn
> Date: Thu Mar  8 21:19:56 2012
> New Revision: 1298588
>
> URL: http://svn.apache.org/viewvc?rev=1298588&view=rev
> Log:
> fixed spelling in NysiisTest
>
> Modified:
>
>  commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java
>
> Modified:
> commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java
> URL:
> http://svn.apache.org/viewvc/commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java?rev=1298588&r1=1298587&r2=1298588&view=diff
>
> ==============================================================================
> ---
> commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java
> (original)
> +++
> commons/proper/codec/trunk/src/test/java/org/apache/commons/codec/language/NysiisTest.java
> Thu Mar  8 21:19:56 2012
> @@ -137,29 +137,29 @@ public class NysiisTest extends StringEn
>         // Algorithm (taken from www.dropby.com/NYSIIS.html):
>         //
>         // 1.  Transcode first characters of name:
> -        //    MAC »   MCC
> -        //    KN  »   NN
> -        //    K   »   C
> -        //    PH  »   FF
> -        //    PF  »   FF
> -        //    SCH »   SSS
> +        //    MAC >   MCC
> +        //    KN  >   NN
> +        //    K   >   C
> +        //    PH  >   FF
> +        //    PF  >   FF
> +        //    SCH >   SSS
>         //
>         // 2.  Transcode last characters of name:
> -        //    EE, IE  »   Y
> -        //    DT,RT,RD,NT,ND  »   D
> +        //    EE, IE  >   Y
> +        //    DT,RT,RD,NT,ND  >   D
>         //
>         // 3.  First character of key = first character of name.
>         //
>         // 4.  Transcode remaining characters by following these rules,
> incrementing by one character each time:
> -        //   4a.   EV  »   AF  else A,E,I,O,U » A
> -        //   4b.   Q   »   G
> -        //   4c.   Z   »   S
> -        //   4d.   M   »   N
> -        //   4e.   KN  »   N   else K » C
> -        //   4f.   SCH     »   SSS
> -        //   4g.   PH  »   FF
> -        //   4h.   H   »   If previous or next is nonvowel, previous
> -        //   4i.   W   »   If previous is vowel, previous
> +        //   4a.   EV  >   AF  else A,E,I,O,U > A
> +        //   4b.   Q   >   G
> +        //   4c.   Z   >   S
> +        //   4d.   M   >   N
> +        //   4e.   KN  >   N   else K > C
> +        //   4f.   SCH >   SSS
> +        //   4g.   PH  >   FF
> +        //   4h.   H   >   If previous or next is nonvowel, previous
> +        //   4i.   W   >   If previous is vowel, previous
>         //   4j.   Add current to key if current != last key character
>         //
>         // 5.  If last character is S, remove it
> @@ -186,7 +186,7 @@ public class NysiisTest extends StringEn
>                         new String[] { "PHILLIPSON", "FALAPSAN" }, //
> Original: FFALAP[SAN]
>                         // violates 4j: see also KNUTH
>                         new String[] { "PFEISTER", "FASTAR" }, //
> Original: FFASTA[R]
> -                        // violoates 4j: see also KNUTH
> +                        // violates 4j: see also KNUTH
>                         new String[] { "SCHOENHOEFT", "SANAFT" }, //
> Original: SSANAF[T]
>                         //
> http://www.dropby.com/indexLF.html?content=/NYSIIS.html
>                         // 2.Transcode last characters of name:
> @@ -213,7 +213,7 @@ public class NysiisTest extends StringEn
>                         // violates 4h: the H should be transcoded to S
> and thus ignored as
>                         // the first key character is also S
>                         new String[] { "SHRIVER", "SRAVAR" }, // Original:
> SHRAVA[R]
> -                        // same as KOEHN, the L gets mysteriously lost,
> the correct one
> +                        // same as KOEHN, the L gets mysteriously lost
>                         new String[] { "KUHL", "CAL" }, // Original: C
>                         new String[] { "RAWSON", "RASAN" },
>                         // If last character is S, remove it
>
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory