You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Kumar V <ku...@yahoo.com> on 2014/09/18 17:57:47 UTC

Handling updates to Bucketed Table

Hi,
    I would like to know how to handle frequent updates to bucketed tables.  Is there a way to update without a rebuild ?
I have a monthly partition for a table with buckets.  But I have to update the table every day.  Is there a way to achieve this without a rebuild of this partition every day ?  Or, is this a wrong use case for a bucketed table ?
This table is joined with another table.  So, I thought bucketing will speed up the queries.  What are my options ?

Please let me know.

Regards,
Murali.

Re: Handling updates to Bucketed Table

Posted by Alain Petrus <al...@gmail.com>.
Hi all,

Very interesting question.  In my case, I have date partition and I am using 8 buckets that are sorted on id.
I am wondering what when adding new data to this table.  Data will be put in the correct partition, but will it be bucketed?

Thanks for your help,
Alain


On 18 Sep 2014, at 20:02, Kumar V <ku...@yahoo.com> wrote:

> Thx Nitin. I just wanted to confirm before I give up. I'll probably do a daily partition and see how it goes.
> 
> Thanks.
> 
> 
> On Thursday, September 18, 2014 12:30 PM, Nitin Pawar <ni...@gmail.com> wrote:
> 
> 
> When you bucket the data in a partition, 
> there will be a file created for each of your bucketing key. 
> 
> Now if you add more data to the same bucket that means that file would need to rebuild 
> 
> I would prefer a partition on day level under month level where I write the data once a day and bucket the data there 
> 
> 
> I am not sure hive supports append to bucketed files yet. 
> please wait for others to answer as well 
> 
> On Thu, Sep 18, 2014 at 9:27 PM, Kumar V <ku...@yahoo.com> wrote:
> Hi,
>     I would like to know how to handle frequent updates to bucketed tables.  Is there a way to update without a rebuild ?
> I have a monthly partition for a table with buckets.  But I have to update the table every day.  Is there a way to achieve this without a rebuild of this partition every day ?  Or, is this a wrong use case for a bucketed table ?
> This table is joined with another table.  So, I thought bucketing will speed up the queries.  What are my options ?
> 
> Please let me know.
> 
> Regards,
> Murali.
> 
> 
> 
> 
> -- 
> Nitin Pawar
> 
> 


Re: Handling updates to Bucketed Table

Posted by Kumar V <ku...@yahoo.com>.
Thx Nitin. I just wanted to confirm before I give up. I'll probably do a daily partition and see how it goes.

Thanks.


On Thursday, September 18, 2014 12:30 PM, Nitin Pawar <ni...@gmail.com> wrote:
 


When you bucket the data in a partition, 
there will be a file created for each of your bucketing key. 

Now if you add more data to the same bucket that means that file would need to rebuild 

I would prefer a partition on day level under month level where I write the data once a day and bucket the data there 


I am not sure hive supports append to bucketed files yet. 
please wait for others to answer as well 


On Thu, Sep 18, 2014 at 9:27 PM, Kumar V <ku...@yahoo.com> wrote:

Hi,
>    I would like to know how to handle frequent updates to bucketed tables.  Is there a way to update without a rebuild ?
>I have a monthly partition for a table with buckets.  But I have to update the table every day.  Is there a way to achieve this without a rebuild of this partition every day ?  Or, is this a wrong use case for a bucketed table ?
>This table is joined with another table.  So, I thought bucketing will speed up the queries.  What are my options ?
>
>
>Please let me know.
>
>
>Regards,
>Murali.
>
>


-- 
Nitin Pawar

Re: Handling updates to Bucketed Table

Posted by Nitin Pawar <ni...@gmail.com>.
When you bucket the data in a partition,
there will be a file created for each of your bucketing key.

Now if you add more data to the same bucket that means that file would need
to rebuild

I would prefer a partition on day level under month level where I write the
data once a day and bucket the data there


I am not sure hive supports append to bucketed files yet.
please wait for others to answer as well

On Thu, Sep 18, 2014 at 9:27 PM, Kumar V <ku...@yahoo.com> wrote:

> Hi,
>     I would like to know how to handle frequent updates to bucketed
> tables.  Is there a way to update without a rebuild ?
> I have a monthly partition for a table with buckets.  But I have to update
> the table every day.  Is there a way to achieve this without a rebuild of
> this partition every day ?  Or, is this a wrong use case for a bucketed
> table ?
> This table is joined with another table.  So, I thought bucketing will
> speed up the queries.  What are my options ?
>
> Please let me know.
>
> Regards,
> Murali.
>
>


-- 
Nitin Pawar