You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by unmesha sreeveni <un...@gmail.com> on 2014/12/02 06:20:33 UTC

Updating not reusing the blocks in previous and former version

I tried to update my record in hive previous version and also tried out
update in hive 0.14.0. The newer version which support hive.

I created a table with 3 buckets with 180 MB. In my warehouse the data get
stored into 3 different blocks

delta_0000012_0000012
--- Block ID: 1073751752
--- Block ID: 1073751750
--- Block ID: 1073751753

After doing an update

I am getting 2 directories

delta_0000012_0000012
--- Block ID: 1073751752
--- Block ID: 1073751750
--- Block ID: 1073751753
AND
delta_0000014_0000014
               ---Block ID: 1073752044

Ie the blocks are not reused
Whether my understanding is correct?
Any pointers?


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: Re: Re: Updating not reusing the blocks in previous and former version

Posted by unmesha sreeveni <un...@gmail.com>.
Thanks
Once  hive.compactor.initiator.on  property is set to true whether the
merge operation takes place and reuse the blocks after each update or do we
need to do Alter statement mensioned
in

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-ConfigurationValuestoSetforCompaction


On Tue, Dec 2, 2014 at 2:46 PM, vic0777 <vi...@163.com> wrote:

>
>
> The compact operation will merge the data and then the blocks may be
> resued.
>
>
>
>
> At 2014-12-02 17:10:43, "unmesha sreeveni" <un...@gmail.com> wrote:
>
> So that block will not be reused right? If we are updating the entire
> block..and at some point we dont need that record...The block will be
> wasted right?
> They need to relize the blocks for further writes, right?
> Am I correct?
>
> On Tue, Dec 2, 2014 at 2:36 PM, vic0777 <vi...@163.com> wrote:
>
>>
>> The document describes how transaction works and what the data layout is:
>> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions. See
>> the "Basic design" section. HDFS is immutable. Hive creates a delta
>> directory for every transaction and merges it when read, so it does not
>> written to the same block.
>>
>> Wantao
>>
>>
>>
>>
>> At 2014-12-02 16:58:26, "unmesha sreeveni" <un...@gmail.com> wrote:
>>
>> Why hive "UPADTE" is not reusing the blocks.
>> the update is not written to same block, why is it so ?
>>
>>
>> On Tue, Dec 2, 2014 at 10:50 AM, unmesha sreeveni <un...@gmail.com>
>> wrote:
>>
>>> I tried to update my record in hive previous version and also tried out
>>> update in hive 0.14.0. The newer version which support hive.
>>>
>>> I created a table with 3 buckets with 180 MB. In my warehouse the data
>>> get stored into 3 different blocks
>>>
>>> delta_0000012_0000012
>>> --- Block ID: 1073751752
>>> --- Block ID: 1073751750
>>> --- Block ID: 1073751753
>>>
>>> After doing an update
>>>
>>> I am getting 2 directories
>>>
>>> delta_0000012_0000012
>>> --- Block ID: 1073751752
>>> --- Block ID: 1073751750
>>> --- Block ID: 1073751753
>>> AND
>>> delta_0000014_0000014
>>>                ---Block ID: 1073752044
>>>
>>> Ie the blocks are not reused
>>> Whether my understanding is correct?
>>> Any pointers?
>>>
>>>
>>> --
>>> *Thanks & Regards *
>>>
>>>
>>> *Unmesha Sreeveni U.B*
>>> *Hadoop, Bigdata Developer*
>>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>>> http://www.unmeshasreeveni.blogspot.in/
>>>
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards *
>>
>>
>> *Unmesha Sreeveni U.B*
>> *Hadoop, Bigdata Developer*
>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re:Re: Re: Updating not reusing the blocks in previous and former version

Posted by vic0777 <vi...@163.com>.

The compact operation will merge the data and then the blocks may be resued.




At 2014-12-02 17:10:43, "unmesha sreeveni" <un...@gmail.com> wrote:

So that block will not be reused right? If we are updating the entire block..and at some point we dont need that record...The block will be wasted right?
They need to relize the blocks for further writes, right?
Am I correct?


On Tue, Dec 2, 2014 at 2:36 PM, vic0777 <vi...@163.com> wrote:



The document describes how transaction works and what the data layout is: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions. See the "Basic design" section. HDFS is immutable. Hive creates a delta directory for every transaction and merges it when read, so it does not written to the same block.

Wantao






At 2014-12-02 16:58:26, "unmesha sreeveni" <un...@gmail.com> wrote:

Why hive "UPADTE" is not reusing the blocks.
the update is not written to same block, why is it so ?




On Tue, Dec 2, 2014 at 10:50 AM, unmesha sreeveni <un...@gmail.com> wrote:

I tried to update my record in hive previous version and also tried out update in hive 0.14.0. The newer version which support hive.


I created a table with 3 buckets with 180 MB. In my warehouse the data get stored into 3 different blocks
 
delta_0000012_0000012 
--- Block ID: 1073751752
--- Block ID: 1073751750
--- Block ID: 1073751753


After doing an update


I am getting 2 directories


delta_0000012_0000012 
--- Block ID: 1073751752
--- Block ID: 1073751750
--- Block ID: 1073751753
AND
delta_0000014_0000014
               ---Block ID: 1073752044


Ie the blocks are not reused
Whether my understanding is correct?
Any pointers?
                             


--

Thanks & Regards


Unmesha Sreeveni U.B

Hadoop, Bigdata Developer
Centre for Cyber Security | Amrita Vishwa Vidyapeetham

http://www.unmeshasreeveni.blogspot.in/










--

Thanks & Regards


Unmesha Sreeveni U.B

Hadoop, Bigdata Developer
Centre for Cyber Security | Amrita Vishwa Vidyapeetham

http://www.unmeshasreeveni.blogspot.in/













--

Thanks & Regards


Unmesha Sreeveni U.B

Hadoop, Bigdata Developer
Centre for Cyber Security | Amrita Vishwa Vidyapeetham

http://www.unmeshasreeveni.blogspot.in/





Re: Re: Updating not reusing the blocks in previous and former version

Posted by unmesha sreeveni <un...@gmail.com>.
So that block will not be reused right? If we are updating the entire
block..and at some point we dont need that record...The block will be
wasted right?
They need to relize the blocks for further writes, right?
Am I correct?

On Tue, Dec 2, 2014 at 2:36 PM, vic0777 <vi...@163.com> wrote:

>
> The document describes how transaction works and what the data layout is:
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions. See
> the "Basic design" section. HDFS is immutable. Hive creates a delta
> directory for every transaction and merges it when read, so it does not
> written to the same block.
>
> Wantao
>
>
>
>
> At 2014-12-02 16:58:26, "unmesha sreeveni" <un...@gmail.com> wrote:
>
> Why hive "UPADTE" is not reusing the blocks.
> the update is not written to same block, why is it so ?
>
>
> On Tue, Dec 2, 2014 at 10:50 AM, unmesha sreeveni <un...@gmail.com>
> wrote:
>
>> I tried to update my record in hive previous version and also tried out
>> update in hive 0.14.0. The newer version which support hive.
>>
>> I created a table with 3 buckets with 180 MB. In my warehouse the data
>> get stored into 3 different blocks
>>
>> delta_0000012_0000012
>> --- Block ID: 1073751752
>> --- Block ID: 1073751750
>> --- Block ID: 1073751753
>>
>> After doing an update
>>
>> I am getting 2 directories
>>
>> delta_0000012_0000012
>> --- Block ID: 1073751752
>> --- Block ID: 1073751750
>> --- Block ID: 1073751753
>> AND
>> delta_0000014_0000014
>>                ---Block ID: 1073752044
>>
>> Ie the blocks are not reused
>> Whether my understanding is correct?
>> Any pointers?
>>
>>
>> --
>> *Thanks & Regards *
>>
>>
>> *Unmesha Sreeveni U.B*
>> *Hadoop, Bigdata Developer*
>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re:Re: Updating not reusing the blocks in previous and former version

Posted by vic0777 <vi...@163.com>.

The document describes how transaction works and what the data layout is: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions. See the "Basic design" section. HDFS is immutable. Hive creates a delta directory for every transaction and merges it when read, so it does not written to the same block.

Wantao






At 2014-12-02 16:58:26, "unmesha sreeveni" <un...@gmail.com> wrote:

Why hive "UPADTE" is not reusing the blocks.
the update is not written to same block, why is it so ?




On Tue, Dec 2, 2014 at 10:50 AM, unmesha sreeveni <un...@gmail.com> wrote:

I tried to update my record in hive previous version and also tried out update in hive 0.14.0. The newer version which support hive.


I created a table with 3 buckets with 180 MB. In my warehouse the data get stored into 3 different blocks
 
delta_0000012_0000012 
--- Block ID: 1073751752
--- Block ID: 1073751750
--- Block ID: 1073751753


After doing an update


I am getting 2 directories


delta_0000012_0000012 
--- Block ID: 1073751752
--- Block ID: 1073751750
--- Block ID: 1073751753
AND
delta_0000014_0000014
               ---Block ID: 1073752044


Ie the blocks are not reused
Whether my understanding is correct?
Any pointers?
                             


--

Thanks & Regards


Unmesha Sreeveni U.B

Hadoop, Bigdata Developer
Centre for Cyber Security | Amrita Vishwa Vidyapeetham

http://www.unmeshasreeveni.blogspot.in/










--

Thanks & Regards


Unmesha Sreeveni U.B

Hadoop, Bigdata Developer
Centre for Cyber Security | Amrita Vishwa Vidyapeetham

http://www.unmeshasreeveni.blogspot.in/





Re: Updating not reusing the blocks in previous and former version

Posted by unmesha sreeveni <un...@gmail.com>.
Why hive "UPADTE" is not reusing the blocks.
the update is not written to same block, why is it so ?


On Tue, Dec 2, 2014 at 10:50 AM, unmesha sreeveni <un...@gmail.com>
wrote:

> I tried to update my record in hive previous version and also tried out
> update in hive 0.14.0. The newer version which support hive.
>
> I created a table with 3 buckets with 180 MB. In my warehouse the data get
> stored into 3 different blocks
>
> delta_0000012_0000012
> --- Block ID: 1073751752
> --- Block ID: 1073751750
> --- Block ID: 1073751753
>
> After doing an update
>
> I am getting 2 directories
>
> delta_0000012_0000012
> --- Block ID: 1073751752
> --- Block ID: 1073751750
> --- Block ID: 1073751753
> AND
> delta_0000014_0000014
>                ---Block ID: 1073752044
>
> Ie the blocks are not reused
> Whether my understanding is correct?
> Any pointers?
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/