You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Eyal Allweil (JIRA)" <ji...@apache.org> on 2016/09/13 08:09:20 UTC

[jira] [Created] (LANG-1266) Add alphabet converter

Eyal Allweil created LANG-1266:
----------------------------------

             Summary: Add alphabet converter
                 Key: LANG-1266
                 URL: https://issues.apache.org/jira/browse/LANG-1266
             Project: Commons Lang
          Issue Type: New Feature
          Components: lang.text.*
            Reporter: Eyal Allweil


(as described in [the mailing list|http://mail-archives.apache.org/mod_mbox/commons-dev/201609.mbox/%3c289983494.3057706.1472720010277@mail.yahoo.com%3e])

This is a utility class I wrote for converting from one alphabet to another - for example, from unicode to latin, without using some of the chars in latin. The usage looks like this:

{code}
Set<Character> originals; // a, b, c, d
Set<Character> encoding; // 0, 1, d
Set<Character> doNotEncode; // d

AlphabetConverter ac = AlphabetConverter.createConverter(originals, encoding, doNotEncode);

ac.encode("a"); // 00
ac.encode("b"); // 01
ac.encode("c"); // 0d
ac.encode("d"); // d
ac.encode("abcd"); // 00010dd

{code}

Of course, x.equals(ac.decode(ac.encode(x))) should always be true.

The implementation provided makes the encodings of fixed length, other than the "do not encode" chars, which remain as they are (length one).

In addition, in order to make it easier to preserve the encoding scheme, I've added a human-readable toString implementation, and a constructor that can recreate an AlphabetConverter from the encoding map, such that:

{code}
AlphabetConverter ac;

ac.equals(AlphabetConverter.createConverterFromMap(ac.getOriginalToEncoded())); // always should be true
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)