You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Arturo Bernal (Jira)" <ji...@apache.org> on 2021/01/15 12:33:01 UTC

[jira] [Commented] (CODEC-249) Incorrect transform of CH digraph according basic rules

    [ https://issues.apache.org/jira/browse/CODEC-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266005#comment-17266005 ] 

Arturo Bernal commented on CODEC-249:
-------------------------------------

Hi [~Kanaduchi]

 

IMO you're right. There is a problem, but i think not only calculating CH.
And for some reason that I don't know it 's limitated to 4

 

assertEquals( "SKMT", this.getStringEncoder().metaphone("SCHEMATIC") ); should be --> SXMTK
 assertEquals( "KRKT", this.getStringEncoder().metaphone("CHARACTER") ); should be --> XRKTR
 assertEquals( "AKSK", this.getStringEncoder().metaphone("AXEAXE") ); should be --> AKSKS

 

 

 

> Incorrect transform of CH digraph according basic rules
> -------------------------------------------------------
>
>                 Key: CODEC-249
>                 URL: https://issues.apache.org/jira/browse/CODEC-249
>             Project: Commons Codec
>          Issue Type: Bug
>            Reporter: Andrey
>            Priority: Major
>
> I detected incorrect transform of CH digraph by metaphone algorithm. 
> According _Philips_ _Lawrence_ CH should be transformed to 'X':
> {code:java}
> 'C' transforms to 'X' if followed by 'IA' or 'H' (unless in latter case, it is part of '-SCH-', in which case it transforms to 'K'). 'C' transforms to 'S' if followed by 'I', 'E', or 'Y'. Otherwise, 'C' transforms to 'K'.
> {code}
> But in Apache realization I see
> {code:java}
> if (isNextChar(local, n, 'H')) { // detect CH
>                         if (n == 0 &&
>                             wdsz >= 3 &&
>                             isVowel(local,2) ) { // CH consonant -> K consonant
>                             code.append('K');
>                         } else {
>                             code.append('X'); // CHvowel -> X
>                         }
> {code}
> So after transformation I get 'K' instead of 'X'
> *Example*: CHERI should be transformed to 'XR' but I get 'KR' which is wrong
> This bug has major priority due to big impact on results of metaphone algorithm



--
This message was sent by Atlassian Jira
(v8.3.4#803005)