You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by 丁桂涛(桂花) <di...@baixing.com> on 2014/07/23 07:34:19 UTC
Hive UDF gives duplicate result regardless of parameters, when nested
in a subquery
Recently I developed a Hive Generic UDF *getad*. It accepts a map type and
a string type parameter and outputs a string value. But I found the UDF
output really confusing in different conditions.
Condition A:
select
getad(map_col, 'tp') as tp,
getad(map_col, 'p') as p,
getad(map_col, 'sp') as sp
from
table_name
where
id = xxxx;
The output is right: 'tp', 'p', 'sp'.
Condition B:
select
array(tp, p, sp) as ps
from
(
select
getad(map_col, 'tp') as tp,
getad(map_col, 'p') as p,
getad(map_col, 'sp') as sp
from
table_name
where
id = xxxx
) t;
The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the
same result:
select
array(
getad(map_col, 'tp'),
getad(map_col, 'p'),
getad(map_col, 'sp')
) as ps
from
table_name
where
id = xxxx;
Could you please provide me some hints on this? Thanks!
--
丁桂涛
Re: Hive UDF gives duplicate result regardless of parameters, when
nested in a subquery
Posted by 丁桂涛(桂花) <di...@baixing.com>.
Yeah. After setting hive.cache.expr.evaluation=false, all queries output
expected results.
And I found that it's related to the getDisplayString function in the UDF.
At first the function returns a string regardless of its parameters. And I
had to set hive.cache.expr.evaluation = false.
But after I changed the function to return string in depend of parameters,
all queries returned expected results even when the hive.cache.expr.evaluation
was set to true.
Thanks Navis. It really helps me a lot.
Best Regards,
Guitao
On Thu, Jul 24, 2014 at 2:55 PM, Navis류승우 <na...@nexr.com> wrote:
> Looks like it's caused by HIVE-7314. Could you try that with
> "hive.cache.expr.evaluation=false"?
>
> Thanks,
> Navis
>
>
> 2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) <di...@baixing.com>:
>
> Yes. The output is correct: ["tp","p","sp"].
>>
>> I developed the UDF using JAVA in eclipse and exported the jar file into
>> the auxlib directory of hive. Then add the following line into the
>> ~/.hiverc file.
>>
>> create temporary function getad as 'xxxxxxx';
>>
>> The hive version is 0.12.0. Perhaps the problem resulted from the
>> mis-optimization of hive.
>>
>>
>> On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin <he...@gmail.com> wrote:
>>
>>> Have you tried this query without UDF, say:
>>>
>>>
>>> select
>>> array(tp, p, sp) as ps
>>> from
>>> (
>>> select
>>> 'tp' as tp,
>>> 'p' as p,
>>> 'sp' as sp
>>> from
>>> table_name
>>> where
>>> id = xxxx
>>> ) t;
>>>
>>>
>>> And how you implement the UDF?
>>>
>>>
>>> 谢谢
>>> 金杰 (Jie Jin)
>>>
>>>
>>> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) <di...@baixing.com> wrote:
>>>
>>>> Recently I developed a Hive Generic UDF *getad*. It accepts a map
>>>> type and a string type parameter and outputs a string value. But I found
>>>> the UDF output really confusing in different conditions.
>>>>
>>>> Condition A:
>>>>
>>>>
>>>> select
>>>> getad(map_col, 'tp') as tp,
>>>> getad(map_col, 'p') as p,
>>>> getad(map_col, 'sp') as sp
>>>> from
>>>> table_name
>>>> where
>>>> id = xxxx;
>>>>
>>>> The output is right: 'tp', 'p', 'sp'.
>>>>
>>>> Condition B:
>>>>
>>>>
>>>> select
>>>> array(tp, p, sp) as ps
>>>> from
>>>> (
>>>> select
>>>> getad(map_col, 'tp') as tp,
>>>> getad(map_col, 'p') as p,
>>>> getad(map_col, 'sp') as sp
>>>> from
>>>> table_name
>>>> where
>>>> id = xxxx
>>>> ) t;
>>>>
>>>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>>>> the same result:
>>>>
>>>>
>>>> select
>>>> array(
>>>> getad(map_col, 'tp'),
>>>> getad(map_col, 'p'),
>>>> getad(map_col, 'sp')
>>>> ) as ps
>>>> from
>>>> table_name
>>>> where
>>>> id = xxxx;
>>>>
>>>> Could you please provide me some hints on this? Thanks!
>>>>
>>>> --
>>>> 丁桂涛
>>>>
>>>
>>>
>>
>>
>> --
>> 丁桂涛
>>
>
>
--
丁桂涛
Re: Hive UDF gives duplicate result regardless of parameters, when
nested in a subquery
Posted by Navis류승우 <na...@nexr.com>.
Looks like it's caused by HIVE-7314. Could you try that with
"hive.cache.expr.evaluation=false"?
Thanks,
Navis
2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) <di...@baixing.com>:
> Yes. The output is correct: ["tp","p","sp"].
>
> I developed the UDF using JAVA in eclipse and exported the jar file into
> the auxlib directory of hive. Then add the following line into the
> ~/.hiverc file.
>
> create temporary function getad as 'xxxxxxx';
>
> The hive version is 0.12.0. Perhaps the problem resulted from the
> mis-optimization of hive.
>
>
> On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin <he...@gmail.com> wrote:
>
>> Have you tried this query without UDF, say:
>>
>> select
>> array(tp, p, sp) as ps
>> from
>> (
>> select
>> 'tp' as tp,
>> 'p' as p,
>> 'sp' as sp
>> from
>> table_name
>> where
>> id = xxxx
>> ) t;
>>
>>
>> And how you implement the UDF?
>>
>>
>> 谢谢
>> 金杰 (Jie Jin)
>>
>>
>> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) <di...@baixing.com> wrote:
>>
>>> Recently I developed a Hive Generic UDF *getad*. It accepts a map type
>>> and a string type parameter and outputs a string value. But I found the UDF
>>> output really confusing in different conditions.
>>>
>>> Condition A:
>>>
>>> select
>>> getad(map_col, 'tp') as tp,
>>> getad(map_col, 'p') as p,
>>> getad(map_col, 'sp') as sp
>>> from
>>> table_name
>>> where
>>> id = xxxx;
>>>
>>> The output is right: 'tp', 'p', 'sp'.
>>>
>>> Condition B:
>>>
>>> select
>>> array(tp, p, sp) as ps
>>> from
>>> (
>>> select
>>> getad(map_col, 'tp') as tp,
>>> getad(map_col, 'p') as p,
>>> getad(map_col, 'sp') as sp
>>> from
>>> table_name
>>> where
>>> id = xxxx
>>> ) t;
>>>
>>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>>> the same result:
>>>
>>> select
>>> array(
>>> getad(map_col, 'tp'),
>>> getad(map_col, 'p'),
>>> getad(map_col, 'sp')
>>> ) as ps
>>> from
>>> table_name
>>> where
>>> id = xxxx;
>>>
>>> Could you please provide me some hints on this? Thanks!
>>>
>>> --
>>> 丁桂涛
>>>
>>
>>
>
>
> --
> 丁桂涛
>
Re: Hive UDF gives duplicate result regardless of parameters, when
nested in a subquery
Posted by 丁桂涛(桂花) <di...@baixing.com>.
Yes. The output is correct: ["tp","p","sp"].
I developed the UDF using JAVA in eclipse and exported the jar file into
the auxlib directory of hive. Then add the following line into the
~/.hiverc file.
create temporary function getad as 'xxxxxxx';
The hive version is 0.12.0. Perhaps the problem resulted from the
mis-optimization of hive.
On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin <he...@gmail.com> wrote:
> Have you tried this query without UDF, say:
>
> select
> array(tp, p, sp) as ps
> from
> (
> select
> 'tp' as tp,
> 'p' as p,
> 'sp' as sp
> from
> table_name
> where
> id = xxxx
> ) t;
>
>
> And how you implement the UDF?
>
>
> 谢谢
> 金杰 (Jie Jin)
>
>
> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) <di...@baixing.com> wrote:
>
>> Recently I developed a Hive Generic UDF *getad*. It accepts a map type
>> and a string type parameter and outputs a string value. But I found the UDF
>> output really confusing in different conditions.
>>
>> Condition A:
>>
>> select
>> getad(map_col, 'tp') as tp,
>> getad(map_col, 'p') as p,
>> getad(map_col, 'sp') as sp
>> from
>> table_name
>> where
>> id = xxxx;
>>
>> The output is right: 'tp', 'p', 'sp'.
>>
>> Condition B:
>>
>> select
>> array(tp, p, sp) as ps
>> from
>> (
>> select
>> getad(map_col, 'tp') as tp,
>> getad(map_col, 'p') as p,
>> getad(map_col, 'sp') as sp
>> from
>> table_name
>> where
>> id = xxxx
>> ) t;
>>
>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>> the same result:
>>
>> select
>> array(
>> getad(map_col, 'tp'),
>> getad(map_col, 'p'),
>> getad(map_col, 'sp')
>> ) as ps
>> from
>> table_name
>> where
>> id = xxxx;
>>
>> Could you please provide me some hints on this? Thanks!
>>
>> --
>> 丁桂涛
>>
>
>
--
丁桂涛
Re: Hive UDF gives duplicate result regardless of parameters, when
nested in a subquery
Posted by Jie Jin <he...@gmail.com>.
Have you tried this query without UDF, say:
select
array(tp, p, sp) as ps
from
(
select
'tp' as tp,
'p' as p,
'sp' as sp
from
table_name
where
id = xxxx
) t;
And how you implement the UDF?
谢谢
金杰 (Jie Jin)
On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) <di...@baixing.com> wrote:
> Recently I developed a Hive Generic UDF *getad*. It accepts a map type
> and a string type parameter and outputs a string value. But I found the UDF
> output really confusing in different conditions.
>
> Condition A:
>
> select
> getad(map_col, 'tp') as tp,
> getad(map_col, 'p') as p,
> getad(map_col, 'sp') as sp
> from
> table_name
> where
> id = xxxx;
>
> The output is right: 'tp', 'p', 'sp'.
>
> Condition B:
>
> select
> array(tp, p, sp) as ps
> from
> (
> select
> getad(map_col, 'tp') as tp,
> getad(map_col, 'p') as p,
> getad(map_col, 'sp') as sp
> from
> table_name
> where
> id = xxxx
> ) t;
>
> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the
> same result:
>
> select
> array(
> getad(map_col, 'tp'),
> getad(map_col, 'p'),
> getad(map_col, 'sp')
> ) as ps
> from
> table_name
> where
> id = xxxx;
>
> Could you please provide me some hints on this? Thanks!
>
> --
> 丁桂涛
>