You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Alberto Ramón <a....@gmail.com> on 2018/05/09 17:23:04 UTC

Partition Pruning using UDF

Hello

We have a UDP to select the correct partition to read 'FindPartition':
Select * from TB where partitionCol =FindPartition();

How I can avoid a full scan of all partitions?


(Set MyPartition=FindPartition();  // Is not valid in Hive)

Re: Partition Pruning using UDF

Posted by Alberto Ramón <a....@gmail.com>.
is this behavior for Hive on TEZ only?
or for Hive on MapReduce too?

On Tez, I think yes: evaluate functions and sub-queries and recompile to
partition pruning.
but in CDH compile all one time


On 16 May 2018 at 07:58, Furcy Pin <pi...@gmail.com> wrote:

> Yes, I believe that's what Constant Object Inspector are used for.
>
> The initialize method returns object inspectors that are used at query
> compilation time to check the type safety and perform potential
> optimizations. If you return a StringConstantObjectInspector containing the
> value that your method is supposed to return, the Hive optimiser will know
> that it can safely perform partition pruning on it.
>
> On Tue, 15 May 2018, 23:32 Alberto Ramón, <a....@gmail.com>
> wrote:
>
>> Yes, I checked, by default all UDF are deterministic (LINK
>> <https://hive.apache.org/javadocs/r1.2.2/api/org/apache/hadoop/hive/ql/udf/UDFType.html>
>> )
>>
>> I think that I need something like 'eager evaluation' --> evaluate UDFs
>> before build physical plan (if not you can't do partition pruning)
>>
>> On 15 May 2018 at 09:21, Furcy Pin <pi...@gmail.com> wrote:
>>
>>> Hi Alberto,
>>>
>>>
>>> If I'm not mistaken, to make sure that this work you need to give the
>>> proper annotation in your UDF code (deterministic, and maybe some other).
>>> You may also need to return a Constant Object Inspector in the unit
>>> method so that Hive knows that it can perform partition pruning with it.
>>>
>>> On Wed, 9 May 2018, 19:23 Alberto Ramón, <a....@gmail.com>
>>> wrote:
>>>
>>>> Hello
>>>>
>>>> We have a UDP to select the correct partition to read 'FindPartition':
>>>> Select * from TB where partitionCol =FindPartition();
>>>>
>>>> How I can avoid a full scan of all partitions?
>>>>
>>>>
>>>> (Set MyPartition=FindPartition();  // Is not valid in Hive)
>>>>
>>>
>>

Re: Partition Pruning using UDF

Posted by Furcy Pin <pi...@gmail.com>.
Yes, I believe that's what Constant Object Inspector are used for.

The initialize method returns object inspectors that are used at query
compilation time to check the type safety and perform potential
optimizations. If you return a StringConstantObjectInspector containing the
value that your method is supposed to return, the Hive optimiser will know
that it can safely perform partition pruning on it.

On Tue, 15 May 2018, 23:32 Alberto Ramón, <a....@gmail.com> wrote:

> Yes, I checked, by default all UDF are deterministic (LINK
> <https://hive.apache.org/javadocs/r1.2.2/api/org/apache/hadoop/hive/ql/udf/UDFType.html>
> )
>
> I think that I need something like 'eager evaluation' --> evaluate UDFs
> before build physical plan (if not you can't do partition pruning)
>
> On 15 May 2018 at 09:21, Furcy Pin <pi...@gmail.com> wrote:
>
>> Hi Alberto,
>>
>>
>> If I'm not mistaken, to make sure that this work you need to give the
>> proper annotation in your UDF code (deterministic, and maybe some other).
>> You may also need to return a Constant Object Inspector in the unit
>> method so that Hive knows that it can perform partition pruning with it.
>>
>> On Wed, 9 May 2018, 19:23 Alberto Ramón, <a....@gmail.com>
>> wrote:
>>
>>> Hello
>>>
>>> We have a UDP to select the correct partition to read 'FindPartition':
>>> Select * from TB where partitionCol =FindPartition();
>>>
>>> How I can avoid a full scan of all partitions?
>>>
>>>
>>> (Set MyPartition=FindPartition();  // Is not valid in Hive)
>>>
>>
>

Re: Partition Pruning using UDF

Posted by Alberto Ramón <a....@gmail.com>.
Yes, I checked, by default all UDF are deterministic (LINK
<https://hive.apache.org/javadocs/r1.2.2/api/org/apache/hadoop/hive/ql/udf/UDFType.html>
)

I think that I need something like 'eager evaluation' --> evaluate UDFs
before build physical plan (if not you can't do partition pruning)

On 15 May 2018 at 09:21, Furcy Pin <pi...@gmail.com> wrote:

> Hi Alberto,
>
>
> If I'm not mistaken, to make sure that this work you need to give the
> proper annotation in your UDF code (deterministic, and maybe some other).
> You may also need to return a Constant Object Inspector in the unit method
> so that Hive knows that it can perform partition pruning with it.
>
> On Wed, 9 May 2018, 19:23 Alberto Ramón, <a....@gmail.com>
> wrote:
>
>> Hello
>>
>> We have a UDP to select the correct partition to read 'FindPartition':
>> Select * from TB where partitionCol =FindPartition();
>>
>> How I can avoid a full scan of all partitions?
>>
>>
>> (Set MyPartition=FindPartition();  // Is not valid in Hive)
>>
>

Re: Partition Pruning using UDF

Posted by Furcy Pin <pi...@gmail.com>.
Hi Alberto,


If I'm not mistaken, to make sure that this work you need to give the
proper annotation in your UDF code (deterministic, and maybe some other).
You may also need to return a Constant Object Inspector in the unit method
so that Hive knows that it can perform partition pruning with it.

On Wed, 9 May 2018, 19:23 Alberto Ramón, <a....@gmail.com> wrote:

> Hello
>
> We have a UDP to select the correct partition to read 'FindPartition':
> Select * from TB where partitionCol =FindPartition();
>
> How I can avoid a full scan of all partitions?
>
>
> (Set MyPartition=FindPartition();  // Is not valid in Hive)
>