You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Krishna <re...@gmail.com> on 2013/02/23 06:48:24 UTC

HIVE-4053 | Review request

Hi,

I've implemented 'Refined Soundex' algorithm using a GenericUDF and would
like to share it for a review by experts as I'm a newbie.

Change Details:
A new java class is created: GenericUDFRefinedSoundex.java
Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref",
GenericUDFRefinedSoundex.class);

Both files are attached to the email.

I'm planning to implement other phonetic algorithms and submit all as a
single patch. I understand there are many other steps that I need to finish
before a patch is ready but for now, if you could review the attached code
and provide feedback, it'll be great.

Here are the details of Refined Soundex algorithm:
First letter is stored
Subsequent letters are replaced by numbers as defined below-
 * B, P => 1
 * F, V => 2
 * C, K, S => 3
 * G, J => 4
 * Q, X, Z => 5
 * D, T => 6
 * L => 7
 * M, N => 8
 * R => 9
 * Other letters => 0
Consecutive letters belonging to the same group are replaced by one letter

Example:
> SELECT soundex_ref('Carren') FROM src LIMIT 1;
> C30908

Thanks,
Krishna

Re: HIVE-4053 | Review request

Posted by Mark Grover <gr...@gmail.com>.
Krishna,
Can you please post a patch on the JIRA and post a review on
reviewboard? You should also consider adding some unit tests. If you
need help with any of this, please let us know.

I will post this on JIRA as well for completeness.

Mark

On Fri, Feb 22, 2013 at 9:48 PM, Krishna <re...@gmail.com> wrote:
> Hi,
>
> I've implemented 'Refined Soundex' algorithm using a GenericUDF and would
> like to share it for a review by experts as I'm a newbie.
>
> Change Details:
> A new java class is created: GenericUDFRefinedSoundex.java
> Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref",
> GenericUDFRefinedSoundex.class);
>
> Both files are attached to the email.
>
> I'm planning to implement other phonetic algorithms and submit all as a
> single patch. I understand there are many other steps that I need to finish
> before a patch is ready but for now, if you could review the attached code
> and provide feedback, it'll be great.
>
> Here are the details of Refined Soundex algorithm:
> First letter is stored
> Subsequent letters are replaced by numbers as defined below-
>  * B, P => 1
>  * F, V => 2
>  * C, K, S => 3
>  * G, J => 4
>  * Q, X, Z => 5
>  * D, T => 6
>  * L => 7
>  * M, N => 8
>  * R => 9
>  * Other letters => 0
> Consecutive letters belonging to the same group are replaced by one letter
>
> Example:
>> SELECT soundex_ref('Carren') FROM src LIMIT 1;
>> C30908
>
> Thanks,
> Krishna
>