You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by wd <wd...@wdicc.com> on 2011/08/15 10:49:40 UTC

slow performance when using udf

hi,

I create a udf to decode urlencoded things, but found the speed for
mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?

package com.test.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;
import java.net.URLDecoder;

public final class urldecode extends UDF {

    public String evaluate(final String s) {
        if (s == null) { return null; }
        return getString(s);
    }

    public static String getString(String s) {
        String a;
        try {
            a = URLDecoder.decode(s);
        } catch ( Exception e) {
            a = "";
        }
        return a;
    }

    public static void main(String args[]) {
        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
        System.out.println( getString(t) );
    }
}

Re: slow performance when using udf

Posted by wd <wd...@wdicc.com>.

Finally, the flowing code get no performance lose. I think the point
is to avoid to use the getString method, Thanks everyone again.

//import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

import java.net.URLDecoder;

public final class urldecode extends UDF {

    private Text t = new Text();

    public Text evaluate(Text s) {
        if (s == null) { return null; }
        try {
            t.set( URLDecoder.decode( s.toString(), "UTF-8" ));
            return t;
        } catch ( Exception e) {
            return null;
        }
    }

    //public static void main(String args[]) {
        //String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
        //System.out.println( getString(t) );
    //}
}


On Tue, Aug 16, 2011 at 10:47 AM, wd <wd...@wdicc.com> wrote:
> Thanks for all your advise, I'll try it out.
>
> On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <ed...@gmail.com> wrote:
>>
>>
>> On Monday, August 15, 2011, Carl Steinbach <ca...@cloudera.com> wrote:
>>> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
>>> should help some with performance.
>>> On Mon, Aug 15, 2011 at 1:49 AM, wd <wd...@wdicc.com> wrote:
>>>>
>>>> hi,
>>>>
>>>> I create a udf to decode urlencoded things, but found the speed for
>>>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>>>>
>>>> package com.test.hive.udf;
>>>>
>>>> import org.apache.hadoop.hive.ql.exec.UDF;
>>>> import java.net.URLDecoder;
>>>>
>>>> public final class urldecode extends UDF {
>>>>
>>>>    public String evaluate(final String s) {
>>>>        if (s == null) { return null; }
>>>>        return getString(s);
>>>>    }
>>>>
>>>>    public static String getString(String s) {
>>>>        String a;
>>>>        try {
>>>>            a = URLDecoder.decode(s);
>>>>        } catch ( Exception e) {
>>>>            a = "";
>>>>        }
>>>>        return a;
>>>>    }
>>>>
>>>>    public static void main(String args[]) {
>>>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>>>        System.out.println( getString(t) );
>>>>    }
>>>> }
>>>
>>>
>>
>> Also you should use class level privatete members to save on object
>> incantation and garbage collection.
>>
>> You also get benefits by matching the args with what you would normally
>> expect from upstream. Hive converts text to string when needed, but if the
>> data normally coming into the method is text you could try and match the
>> argument and see if it is any faster.
>

Re: slow performance when using udf

Posted by wd <wd...@wdicc.com>.

Thanks for all your advise, I'll try it out.

On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <ed...@gmail.com> wrote:
>
>
> On Monday, August 15, 2011, Carl Steinbach <ca...@cloudera.com> wrote:
>> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
>> should help some with performance.
>> On Mon, Aug 15, 2011 at 1:49 AM, wd <wd...@wdicc.com> wrote:
>>>
>>> hi,
>>>
>>> I create a udf to decode urlencoded things, but found the speed for
>>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>>>
>>> package com.test.hive.udf;
>>>
>>> import org.apache.hadoop.hive.ql.exec.UDF;
>>> import java.net.URLDecoder;
>>>
>>> public final class urldecode extends UDF {
>>>
>>>    public String evaluate(final String s) {
>>>        if (s == null) { return null; }
>>>        return getString(s);
>>>    }
>>>
>>>    public static String getString(String s) {
>>>        String a;
>>>        try {
>>>            a = URLDecoder.decode(s);
>>>        } catch ( Exception e) {
>>>            a = "";
>>>        }
>>>        return a;
>>>    }
>>>
>>>    public static void main(String args[]) {
>>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>>        System.out.println( getString(t) );
>>>    }
>>> }
>>
>>
>
> Also you should use class level privatete members to save on object
> incantation and garbage collection.
>
> You also get benefits by matching the args with what you would normally
> expect from upstream. Hive converts text to string when needed, but if the
> data normally coming into the method is text you could try and match the
> argument and see if it is any faster.

Re: slow performance when using udf

Posted by Edward Capriolo <ed...@gmail.com>.

On Monday, August 15, 2011, Carl Steinbach <ca...@cloudera.com> wrote:
> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
should help some with performance.
> On Mon, Aug 15, 2011 at 1:49 AM, wd <wd...@wdicc.com> wrote:
>>
>> hi,
>>
>> I create a udf to decode urlencoded things, but found the speed for
>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>>
>> package com.test.hive.udf;
>>
>> import org.apache.hadoop.hive.ql.exec.UDF;
>> import java.net.URLDecoder;
>>
>> public final class urldecode extends UDF {
>>
>>    public String evaluate(final String s) {
>>        if (s == null) { return null; }
>>        return getString(s);
>>    }
>>
>>    public static String getString(String s) {
>>        String a;
>>        try {
>>            a = URLDecoder.decode(s);
>>        } catch ( Exception e) {
>>            a = "";
>>        }
>>        return a;
>>    }
>>
>>    public static void main(String args[]) {
>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>        System.out.println( getString(t) );
>>    }
>> }
>
>

Also you should use class level privatete members to save on object
incantation and garbage collection.

You also get benefits by matching the args with what you would normally
expect from upstream. Hive converts text to string when needed, but if the
data normally coming into the method is text you could try and match the
argument and see if it is any faster.

Re: slow performance when using udf

Posted by Carl Steinbach <ca...@cloudera.com>.

Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
should help some with performance.

On Mon, Aug 15, 2011 at 1:49 AM, wd <wd...@wdicc.com> wrote:

> hi,
>
> I create a udf to decode urlencoded things, but found the speed for
> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>
> package com.test.hive.udf;
>
> import org.apache.hadoop.hive.ql.exec.UDF;
> import java.net.URLDecoder;
>
> public final class urldecode extends UDF {
>
>    public String evaluate(final String s) {
>        if (s == null) { return null; }
>        return getString(s);
>    }
>
>    public static String getString(String s) {
>        String a;
>        try {
>            a = URLDecoder.decode(s);
>        } catch ( Exception e) {
>            a = "";
>        }
>        return a;
>    }
>
>    public static void main(String args[]) {
>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>        System.out.println( getString(t) );
>    }
> }
>