You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "kulkarni.swarnim@gmail.com" <ku...@gmail.com> on 2012/10/15 18:39:09 UTC

Partitions on hive hbase table

All,

So, I have an external table in hive backed by a huge hbase table. I was
wondering what are the best practices to partition my data so that my
queries do not have to do a full-table scan always?

A quick research on this yielded some ways where the partition would need
to be created and then data loaded into these partitions. Or to use dynamic
partitions.

Is there any way to limit the scans based on the start and stop keys? Also,
if I decide to go with dynamic partitions, how do I keep the data up to
date in my partitioned tables?

Thanks for any help.

-- 
Swarnim

Re: Partitions on hive hbase table

Posted by bharath vissapragada <bh...@gmail.com>.
That patch hasn't been reviewed (the one about non-key columns) yet , may
be you can try it and let me know if you face any problems !


On Thu, Oct 18, 2012 at 12:56 AM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:

> Thanks for the reply bharath. It was helpful.
>
> Looking into HIVE-1643 to support range scans, is the patch for it ready
> to be consumed or it still would go some modifications?
>
> Thanks,
>
>
> On Mon, Oct 15, 2012 at 8:56 PM, bharath vissapragada <
> bharathvissapragada1990@gmail.com> wrote:
>
>> Hi,
>>
>> In your queries, if you have some select predicates of the form, row-key
>> > x / row-key < y / row-key < x and row-key >y  they are used in building
>> the correct scan object with these values. For ex, the query would look
>> something like this,
>>
>> select <something> from <hbase-hive-table> where row-key < x and row-key
>> >y  and ....<so-on>.   (Just include such predicates on hbase row key)
>>
>> Thanks
>>
>>
>>
>> On Mon, Oct 15, 2012 at 10:27 PM, kulkarni.swarnim@gmail.com <
>> kulkarni.swarnim@gmail.com> wrote:
>>
>>> Thanks bharath.
>>>
>>> Do you have an example of what such query would look like?
>>>
>>> Thanks,
>>>
>>>
>>> On Mon, Oct 15, 2012 at 11:48 AM, bharath vissapragada <
>>> bharathvissapragada1990@gmail.com> wrote:
>>>
>>>>
>>>> I'm not sure about partitioning but the scans are currently limited
>>>> based on start and stop keys ( if predicates on rowkeys are provided in the
>>>> query)
>>>>
>>>> See Hive-1643 ,2815 jiras !
>>>>
>>>> On Mon, Oct 15, 2012 at 10:09 PM, kulkarni.swarnim@gmail.com <
>>>> kulkarni.swarnim@gmail.com> wrote:
>>>>
>>>>> All,
>>>>>
>>>>> So, I have an external table in hive backed by a huge hbase table. I
>>>>> was wondering what are the best practices to partition my data so that my
>>>>> queries do not have to do a full-table scan always?
>>>>>
>>>>> A quick research on this yielded some ways where the partition would
>>>>> need to be created and then data loaded into these partitions. Or to use
>>>>> dynamic partitions.
>>>>>
>>>>> Is there any way to limit the scans based on the start and stop keys?
>>>>> Also, if I decide to go with dynamic partitions, how do I keep the data up
>>>>> to date in my partitioned tables?
>>>>>
>>>>> Thanks for any help.
>>>>>
>>>>> --
>>>>> Swarnim
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Bharath .V
>>>>> w:http://researchweb.iiit.ac.in/~bharath.v
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>>
>> --
>> Regards,
>> Bharath .V
>> w:http://researchweb.iiit.ac.in/~bharath.v
>>
>
>
>
> --
> Swarnim
>



-- 
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v

Re: Partitions on hive hbase table

Posted by "kulkarni.swarnim@gmail.com" <ku...@gmail.com>.
Thanks for the reply bharath. It was helpful.

Looking into HIVE-1643 to support range scans, is the patch for it ready to
be consumed or it still would go some modifications?

Thanks,

On Mon, Oct 15, 2012 at 8:56 PM, bharath vissapragada <
bharathvissapragada1990@gmail.com> wrote:

> Hi,
>
> In your queries, if you have some select predicates of the form, row-key >
> x / row-key < y / row-key < x and row-key >y  they are used in building the
> correct scan object with these values. For ex, the query would look
> something like this,
>
> select <something> from <hbase-hive-table> where row-key < x and row-key
> >y  and ....<so-on>.   (Just include such predicates on hbase row key)
>
> Thanks
>
>
>
> On Mon, Oct 15, 2012 at 10:27 PM, kulkarni.swarnim@gmail.com <
> kulkarni.swarnim@gmail.com> wrote:
>
>> Thanks bharath.
>>
>> Do you have an example of what such query would look like?
>>
>> Thanks,
>>
>>
>> On Mon, Oct 15, 2012 at 11:48 AM, bharath vissapragada <
>> bharathvissapragada1990@gmail.com> wrote:
>>
>>>
>>> I'm not sure about partitioning but the scans are currently limited
>>> based on start and stop keys ( if predicates on rowkeys are provided in the
>>> query)
>>>
>>> See Hive-1643 ,2815 jiras !
>>>
>>> On Mon, Oct 15, 2012 at 10:09 PM, kulkarni.swarnim@gmail.com <
>>> kulkarni.swarnim@gmail.com> wrote:
>>>
>>>> All,
>>>>
>>>> So, I have an external table in hive backed by a huge hbase table. I
>>>> was wondering what are the best practices to partition my data so that my
>>>> queries do not have to do a full-table scan always?
>>>>
>>>> A quick research on this yielded some ways where the partition would
>>>> need to be created and then data loaded into these partitions. Or to use
>>>> dynamic partitions.
>>>>
>>>> Is there any way to limit the scans based on the start and stop keys?
>>>> Also, if I decide to go with dynamic partitions, how do I keep the data up
>>>> to date in my partitioned tables?
>>>>
>>>> Thanks for any help.
>>>>
>>>> --
>>>> Swarnim
>>>>
>>>> --
>>>> Regards,
>>>> Bharath .V
>>>> w:http://researchweb.iiit.ac.in/~bharath.v
>>>>
>>>>
>>
>>
>> --
>> Swarnim
>>
>
>
>
> --
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v
>



-- 
Swarnim

Re: Partitions on hive hbase table

Posted by bharath vissapragada <bh...@gmail.com>.
Hi,

In your queries, if you have some select predicates of the form, row-key >
x / row-key < y / row-key < x and row-key >y  they are used in building the
correct scan object with these values. For ex, the query would look
something like this,

select <something> from <hbase-hive-table> where row-key < x and row-key >y
 and ....<so-on>.   (Just include such predicates on hbase row key)

Thanks



On Mon, Oct 15, 2012 at 10:27 PM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:

> Thanks bharath.
>
> Do you have an example of what such query would look like?
>
> Thanks,
>
>
> On Mon, Oct 15, 2012 at 11:48 AM, bharath vissapragada <
> bharathvissapragada1990@gmail.com> wrote:
>
>>
>> I'm not sure about partitioning but the scans are currently limited based
>> on start and stop keys ( if predicates on rowkeys are provided in the query)
>>
>> See Hive-1643 ,2815 jiras !
>>
>> On Mon, Oct 15, 2012 at 10:09 PM, kulkarni.swarnim@gmail.com <
>> kulkarni.swarnim@gmail.com> wrote:
>>
>>> All,
>>>
>>> So, I have an external table in hive backed by a huge hbase table. I was
>>> wondering what are the best practices to partition my data so that my
>>> queries do not have to do a full-table scan always?
>>>
>>> A quick research on this yielded some ways where the partition would
>>> need to be created and then data loaded into these partitions. Or to use
>>> dynamic partitions.
>>>
>>> Is there any way to limit the scans based on the start and stop keys?
>>> Also, if I decide to go with dynamic partitions, how do I keep the data up
>>> to date in my partitioned tables?
>>>
>>> Thanks for any help.
>>>
>>> --
>>> Swarnim
>>>
>>> --
>>> Regards,
>>> Bharath .V
>>> w:http://researchweb.iiit.ac.in/~bharath.v
>>>
>>>
>
>
> --
> Swarnim
>



-- 
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v

Re: Partitions on hive hbase table

Posted by "kulkarni.swarnim@gmail.com" <ku...@gmail.com>.
Thanks bharath.

Do you have an example of what such query would look like?

Thanks,

On Mon, Oct 15, 2012 at 11:48 AM, bharath vissapragada <
bharathvissapragada1990@gmail.com> wrote:

>
> I'm not sure about partitioning but the scans are currently limited based
> on start and stop keys ( if predicates on rowkeys are provided in the query)
>
> See Hive-1643 ,2815 jiras !
>
> On Mon, Oct 15, 2012 at 10:09 PM, kulkarni.swarnim@gmail.com <
> kulkarni.swarnim@gmail.com> wrote:
>
>> All,
>>
>> So, I have an external table in hive backed by a huge hbase table. I was
>> wondering what are the best practices to partition my data so that my
>> queries do not have to do a full-table scan always?
>>
>> A quick research on this yielded some ways where the partition would need
>> to be created and then data loaded into these partitions. Or to use dynamic
>> partitions.
>>
>> Is there any way to limit the scans based on the start and stop keys?
>> Also, if I decide to go with dynamic partitions, how do I keep the data up
>> to date in my partitioned tables?
>>
>> Thanks for any help.
>>
>> --
>> Swarnim
>>
>> --
>> Regards,
>> Bharath .V
>> w:http://researchweb.iiit.ac.in/~bharath.v
>>
>>


-- 
Swarnim

Re: Partitions on hive hbase table

Posted by bharath vissapragada <bh...@gmail.com>.
I'm not sure about partitioning but the scans are currently limited based
on start and stop keys ( if predicates on rowkeys are provided in the query)

See Hive-1643 ,2815 jiras !

On Mon, Oct 15, 2012 at 10:09 PM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:

> All,
>
> So, I have an external table in hive backed by a huge hbase table. I was
> wondering what are the best practices to partition my data so that my
> queries do not have to do a full-table scan always?
>
> A quick research on this yielded some ways where the partition would need
> to be created and then data loaded into these partitions. Or to use dynamic
> partitions.
>
> Is there any way to limit the scans based on the start and stop keys?
> Also, if I decide to go with dynamic partitions, how do I keep the data up
> to date in my partitioned tables?
>
> Thanks for any help.
>
> --
> Swarnim
>
> --
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v
>
>