You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by 丁桂涛(桂花) <di...@baixing.com> on 2014/07/23 07:34:19 UTC

Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

Recently I developed a Hive Generic UDF *getad*. It accepts a map type and
a string type parameter and outputs a string value. But I found the UDF
output really confusing in different conditions.

Condition A:

select
  getad(map_col, 'tp') as tp,
  getad(map_col, 'p') as p,
  getad(map_col, 'sp') as sp
from
  table_name
where
  id = xxxx;

The output is right: 'tp', 'p', 'sp'.

Condition B:

select
  array(tp, p, sp) as ps
from
  (
  select
    getad(map_col, 'tp') as tp,
    getad(map_col, 'p') as p,
    getad(map_col, 'sp') as sp
  from
    table_name
  where
    id = xxxx
  ) t;

The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the
same result:

select
  array(
    getad(map_col, 'tp'),
    getad(map_col, 'p'),
    getad(map_col, 'sp')
  ) as ps
from
  table_name
where
  id = xxxx;

Could you please provide me some hints on this? Thanks!

-- 
丁桂涛

Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

Posted by 丁桂涛(桂花) <di...@baixing.com>.
Yeah. After setting hive.cache.expr.evaluation=false, all queries output
expected results.

And I found that it's related to the getDisplayString function in the UDF.
At first the function returns a string regardless of its parameters. And I
had to set hive.cache.expr.evaluation = false.

But after I changed the function to return string in depend of parameters,
all queries returned expected results even when the hive.cache.expr.evaluation
was set to true.

Thanks Navis. It really helps me a lot.

Best Regards,

Guitao


On Thu, Jul 24, 2014 at 2:55 PM, Navis류승우 <na...@nexr.com> wrote:

> Looks like it's caused by HIVE-7314. Could you try that with
> "hive.cache.expr.evaluation=false"?
>
> Thanks,
> Navis
>
>
> 2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) <di...@baixing.com>:
>
> Yes. The output is correct: ["tp","p","sp"].
>>
>> I developed the UDF using JAVA in eclipse and exported the jar file into
>> the auxlib directory of hive. Then add the following line into the
>> ~/.hiverc file.
>>
>> create temporary function getad as 'xxxxxxx';
>>
>> The hive version is 0.12.0. Perhaps the problem resulted from the
>> mis-optimization of hive.
>>
>>
>> On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin <he...@gmail.com> wrote:
>>
>>> Have you tried this query without UDF, say:
>>>
>>>
>>> select
>>>   array(tp, p, sp) as ps
>>> from
>>>   (
>>>   select
>>>     'tp' as tp,
>>>     'p' as p,
>>>     'sp' as sp
>>>   from
>>>     table_name
>>>   where
>>>     id = xxxx
>>>   ) t;
>>>
>>>
>>> ​And how you implement the UDF?​
>>>
>>>
>>> 谢谢
>>> 金杰 (Jie Jin)
>>>
>>>
>>> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) <di...@baixing.com> wrote:
>>>
>>>>  Recently I developed a Hive Generic UDF *getad*. It accepts a map
>>>> type and a string type parameter and outputs a string value. But I found
>>>> the UDF output really confusing in different conditions.
>>>>
>>>> Condition A:
>>>>
>>>>
>>>> select
>>>>   getad(map_col, 'tp') as tp,
>>>>   getad(map_col, 'p') as p,
>>>>   getad(map_col, 'sp') as sp
>>>> from
>>>>   table_name
>>>> where
>>>>   id = xxxx;
>>>>
>>>> The output is right: 'tp', 'p', 'sp'.
>>>>
>>>> Condition B:
>>>>
>>>>
>>>> select
>>>>   array(tp, p, sp) as ps
>>>> from
>>>>   (
>>>>   select
>>>>     getad(map_col, 'tp') as tp,
>>>>     getad(map_col, 'p') as p,
>>>>     getad(map_col, 'sp') as sp
>>>>   from
>>>>     table_name
>>>>   where
>>>>     id = xxxx
>>>>   ) t;
>>>>
>>>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>>>> the same result:
>>>>
>>>>
>>>> select
>>>>   array(
>>>>     getad(map_col, 'tp'),
>>>>     getad(map_col, 'p'),
>>>>     getad(map_col, 'sp')
>>>>   ) as ps
>>>> from
>>>>   table_name
>>>> where
>>>>   id = xxxx;
>>>>
>>>> Could you please provide me some hints on this? Thanks!
>>>>
>>>> --
>>>> 丁桂涛
>>>>
>>>
>>>
>>
>>
>> --
>> 丁桂涛
>>
>
>


-- 
丁桂涛

Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

Posted by Navis류승우 <na...@nexr.com>.
Looks like it's caused by HIVE-7314. Could you try that with
"hive.cache.expr.evaluation=false"?

Thanks,
Navis


2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) <di...@baixing.com>:

> Yes. The output is correct: ["tp","p","sp"].
>
> I developed the UDF using JAVA in eclipse and exported the jar file into
> the auxlib directory of hive. Then add the following line into the
> ~/.hiverc file.
>
> create temporary function getad as 'xxxxxxx';
>
> The hive version is 0.12.0. Perhaps the problem resulted from the
> mis-optimization of hive.
>
>
> On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin <he...@gmail.com> wrote:
>
>> Have you tried this query without UDF, say:
>>
>> select
>>   array(tp, p, sp) as ps
>> from
>>   (
>>   select
>>     'tp' as tp,
>>     'p' as p,
>>     'sp' as sp
>>   from
>>     table_name
>>   where
>>     id = xxxx
>>   ) t;
>>
>>
>> ​And how you implement the UDF?​
>>
>>
>> 谢谢
>> 金杰 (Jie Jin)
>>
>>
>> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) <di...@baixing.com> wrote:
>>
>>>  Recently I developed a Hive Generic UDF *getad*. It accepts a map type
>>> and a string type parameter and outputs a string value. But I found the UDF
>>> output really confusing in different conditions.
>>>
>>> Condition A:
>>>
>>> select
>>>   getad(map_col, 'tp') as tp,
>>>   getad(map_col, 'p') as p,
>>>   getad(map_col, 'sp') as sp
>>> from
>>>   table_name
>>> where
>>>   id = xxxx;
>>>
>>> The output is right: 'tp', 'p', 'sp'.
>>>
>>> Condition B:
>>>
>>> select
>>>   array(tp, p, sp) as ps
>>> from
>>>   (
>>>   select
>>>     getad(map_col, 'tp') as tp,
>>>     getad(map_col, 'p') as p,
>>>     getad(map_col, 'sp') as sp
>>>   from
>>>     table_name
>>>   where
>>>     id = xxxx
>>>   ) t;
>>>
>>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>>> the same result:
>>>
>>> select
>>>   array(
>>>     getad(map_col, 'tp'),
>>>     getad(map_col, 'p'),
>>>     getad(map_col, 'sp')
>>>   ) as ps
>>> from
>>>   table_name
>>> where
>>>   id = xxxx;
>>>
>>> Could you please provide me some hints on this? Thanks!
>>>
>>> --
>>> 丁桂涛
>>>
>>
>>
>
>
> --
> 丁桂涛
>

Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

Posted by 丁桂涛(桂花) <di...@baixing.com>.
Yes. The output is correct: ["tp","p","sp"].

I developed the UDF using JAVA in eclipse and exported the jar file into
the auxlib directory of hive. Then add the following line into the
~/.hiverc file.

create temporary function getad as 'xxxxxxx';

The hive version is 0.12.0. Perhaps the problem resulted from the
mis-optimization of hive.


On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin <he...@gmail.com> wrote:

> Have you tried this query without UDF, say:
>
> select
>   array(tp, p, sp) as ps
> from
>   (
>   select
>     'tp' as tp,
>     'p' as p,
>     'sp' as sp
>   from
>     table_name
>   where
>     id = xxxx
>   ) t;
>
>
> ​And how you implement the UDF?​
>
>
> 谢谢
> 金杰 (Jie Jin)
>
>
> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) <di...@baixing.com> wrote:
>
>>  Recently I developed a Hive Generic UDF *getad*. It accepts a map type
>> and a string type parameter and outputs a string value. But I found the UDF
>> output really confusing in different conditions.
>>
>> Condition A:
>>
>> select
>>   getad(map_col, 'tp') as tp,
>>   getad(map_col, 'p') as p,
>>   getad(map_col, 'sp') as sp
>> from
>>   table_name
>> where
>>   id = xxxx;
>>
>> The output is right: 'tp', 'p', 'sp'.
>>
>> Condition B:
>>
>> select
>>   array(tp, p, sp) as ps
>> from
>>   (
>>   select
>>     getad(map_col, 'tp') as tp,
>>     getad(map_col, 'p') as p,
>>     getad(map_col, 'sp') as sp
>>   from
>>     table_name
>>   where
>>     id = xxxx
>>   ) t;
>>
>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>> the same result:
>>
>> select
>>   array(
>>     getad(map_col, 'tp'),
>>     getad(map_col, 'p'),
>>     getad(map_col, 'sp')
>>   ) as ps
>> from
>>   table_name
>> where
>>   id = xxxx;
>>
>> Could you please provide me some hints on this? Thanks!
>>
>> --
>> 丁桂涛
>>
>
>


-- 
丁桂涛

Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

Posted by Jie Jin <he...@gmail.com>.
Have you tried this query without UDF, say:

select
  array(tp, p, sp) as ps
from
  (
  select
    'tp' as tp,
    'p' as p,
    'sp' as sp
  from
    table_name
  where
    id = xxxx
  ) t;


​And how you implement the UDF?​


谢谢
金杰 (Jie Jin)


On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) <di...@baixing.com> wrote:

> Recently I developed a Hive Generic UDF *getad*. It accepts a map type
> and a string type parameter and outputs a string value. But I found the UDF
> output really confusing in different conditions.
>
> Condition A:
>
> select
>   getad(map_col, 'tp') as tp,
>   getad(map_col, 'p') as p,
>   getad(map_col, 'sp') as sp
> from
>   table_name
> where
>   id = xxxx;
>
> The output is right: 'tp', 'p', 'sp'.
>
> Condition B:
>
> select
>   array(tp, p, sp) as ps
> from
>   (
>   select
>     getad(map_col, 'tp') as tp,
>     getad(map_col, 'p') as p,
>     getad(map_col, 'sp') as sp
>   from
>     table_name
>   where
>     id = xxxx
>   ) t;
>
> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the
> same result:
>
> select
>   array(
>     getad(map_col, 'tp'),
>     getad(map_col, 'p'),
>     getad(map_col, 'sp')
>   ) as ps
> from
>   table_name
> where
>   id = xxxx;
>
> Could you please provide me some hints on this? Thanks!
>
> --
> 丁桂涛
>