You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by yonghu <yo...@gmail.com> on 2012/08/14 12:04:38 UTC

What happened in hlog if data are deleted cuased by ttl?

My hbase version is 0.92. I tried something as follows:
1.Created a table 'test' with 'course' in which ttl=5.
2. inserted one row into the table. 5 seconds later, the row was deleted.
Later when I checked the log infor of 'test' table, I only found the
inserted information but not deleted information.

Can anyone tell me which information is written into hlog when data is
deleted by ttl or in this situation, no information is written into
the hlog. If there is no information of deletion in the log, how can
we guarantee the data recovered by log are correct?

Thanks!

Yong

RE: What happened in hlog if data are deleted cuased by ttl?

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.
Hi

Just to add on,  The HLog is just an edit log.  Any transaction updates(
Puts/Deletes) are just added to HLog.  It is the Scanner that takes care of
the TTL part which is calculated from the TTL configured at the column
family(Store) level.

Regards
Ram

> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, August 14, 2012 8:51 PM
> To: user@hbase.apache.org
> Subject: Re: What happened in hlog if data are deleted cuased by ttl?
> 
> Yes, TTL deletions are done only during compactions. They aren't
> "Deleted" in the sense of what a Delete insert signifies, but are
> rather eliminated in the write process when new
> storefiles are written out - if the value being written to the
> compacted store has already expired.
> 
> On Tue, Aug 14, 2012 at 8:40 PM, yonghu <yo...@gmail.com> wrote:
> > Hi Hars,
> >
> > Thanks for your reply. If I understand you right, it means the ttl
> > deletion will not reflect in log.
> >
> > On Tue, Aug 14, 2012 at 3:24 PM, Harsh J <ha...@cloudera.com> wrote:
> >> Hi Yonghu,
> >>
> >> A timestamp is stored along with each insert. The ttl is maintained
> at
> >> the region-store level. Hence, when the log replays, all entries
> with
> >> expired TTLs are automatically omitted.
> >>
> >> Also, TTL deletions happen during compactions, and hence do not
> >> carry/need Delete events. When scanning a store file, TTL-expired
> >> entries are automatically skipped away.
> >>
> >> On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com>
> wrote:
> >>> My hbase version is 0.92. I tried something as follows:
> >>> 1.Created a table 'test' with 'course' in which ttl=5.
> >>> 2. inserted one row into the table. 5 seconds later, the row was
> deleted.
> >>> Later when I checked the log infor of 'test' table, I only found
> the
> >>> inserted information but not deleted information.
> >>>
> >>> Can anyone tell me which information is written into hlog when data
> is
> >>> deleted by ttl or in this situation, no information is written into
> >>> the hlog. If there is no information of deletion in the log, how
> can
> >>> we guarantee the data recovered by log are correct?
> >>>
> >>> Thanks!
> >>>
> >>> Yong
> >>
> >>
> >>
> >> --
> >> Harsh J
> 
> 
> 
> --
> Harsh J


Re: What happened in hlog if data are deleted cuased by ttl?

Posted by yonghu <yo...@gmail.com>.
Thanks for your response. Can you tell me how the data is deleted due
to the ttl? Which module in HBase will trigger deletion? You mentioned
the scanner, does it mean the scanner will scan the store file
periodically and then deletes the data which expire?

regards!

Yong

On Thu, Aug 16, 2012 at 6:16 AM, Ramkrishna.S.Vasudevan
<ra...@huawei.com> wrote:
> Hi
>
> Just to add on,  The HLog is just an edit log.  Any transaction updates(
> Puts/Deletes) are just added to HLog.  It is the Scanner that takes care of
> the TTL part which is calculated from the TTL configured at the column
> family(Store) level.
>
> Regards
> Ram
>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, August 14, 2012 8:51 PM
>> To: user@hbase.apache.org
>> Subject: Re: What happened in hlog if data are deleted cuased by ttl?
>>
>> Yes, TTL deletions are done only during compactions. They aren't
>> "Deleted" in the sense of what a Delete insert signifies, but are
>> rather eliminated in the write process when new
>> storefiles are written out - if the value being written to the
>> compacted store has already expired.
>>
>> On Tue, Aug 14, 2012 at 8:40 PM, yonghu <yo...@gmail.com> wrote:
>> > Hi Hars,
>> >
>> > Thanks for your reply. If I understand you right, it means the ttl
>> > deletion will not reflect in log.
>> >
>> > On Tue, Aug 14, 2012 at 3:24 PM, Harsh J <ha...@cloudera.com> wrote:
>> >> Hi Yonghu,
>> >>
>> >> A timestamp is stored along with each insert. The ttl is maintained
>> at
>> >> the region-store level. Hence, when the log replays, all entries
>> with
>> >> expired TTLs are automatically omitted.
>> >>
>> >> Also, TTL deletions happen during compactions, and hence do not
>> >> carry/need Delete events. When scanning a store file, TTL-expired
>> >> entries are automatically skipped away.
>> >>
>> >> On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com>
>> wrote:
>> >>> My hbase version is 0.92. I tried something as follows:
>> >>> 1.Created a table 'test' with 'course' in which ttl=5.
>> >>> 2. inserted one row into the table. 5 seconds later, the row was
>> deleted.
>> >>> Later when I checked the log infor of 'test' table, I only found
>> the
>> >>> inserted information but not deleted information.
>> >>>
>> >>> Can anyone tell me which information is written into hlog when data
>> is
>> >>> deleted by ttl or in this situation, no information is written into
>> >>> the hlog. If there is no information of deletion in the log, how
>> can
>> >>> we guarantee the data recovered by log are correct?
>> >>>
>> >>> Thanks!
>> >>>
>> >>> Yong
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J
>

Re: What happened in hlog if data are deleted cuased by ttl?

Posted by Harsh J <ha...@cloudera.com>.
Yes, TTL deletions are done only during compactions. They aren't
"Deleted" in the sense of what a Delete insert signifies, but are
rather eliminated in the write process when new
storefiles are written out - if the value being written to the
compacted store has already expired.

On Tue, Aug 14, 2012 at 8:40 PM, yonghu <yo...@gmail.com> wrote:
> Hi Hars,
>
> Thanks for your reply. If I understand you right, it means the ttl
> deletion will not reflect in log.
>
> On Tue, Aug 14, 2012 at 3:24 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi Yonghu,
>>
>> A timestamp is stored along with each insert. The ttl is maintained at
>> the region-store level. Hence, when the log replays, all entries with
>> expired TTLs are automatically omitted.
>>
>> Also, TTL deletions happen during compactions, and hence do not
>> carry/need Delete events. When scanning a store file, TTL-expired
>> entries are automatically skipped away.
>>
>> On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com> wrote:
>>> My hbase version is 0.92. I tried something as follows:
>>> 1.Created a table 'test' with 'course' in which ttl=5.
>>> 2. inserted one row into the table. 5 seconds later, the row was deleted.
>>> Later when I checked the log infor of 'test' table, I only found the
>>> inserted information but not deleted information.
>>>
>>> Can anyone tell me which information is written into hlog when data is
>>> deleted by ttl or in this situation, no information is written into
>>> the hlog. If there is no information of deletion in the log, how can
>>> we guarantee the data recovered by log are correct?
>>>
>>> Thanks!
>>>
>>> Yong
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: What happened in hlog if data are deleted cuased by ttl?

Posted by yonghu <yo...@gmail.com>.
Hi Hars,

Thanks for your reply. If I understand you right, it means the ttl
deletion will not reflect in log.

On Tue, Aug 14, 2012 at 3:24 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi Yonghu,
>
> A timestamp is stored along with each insert. The ttl is maintained at
> the region-store level. Hence, when the log replays, all entries with
> expired TTLs are automatically omitted.
>
> Also, TTL deletions happen during compactions, and hence do not
> carry/need Delete events. When scanning a store file, TTL-expired
> entries are automatically skipped away.
>
> On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com> wrote:
>> My hbase version is 0.92. I tried something as follows:
>> 1.Created a table 'test' with 'course' in which ttl=5.
>> 2. inserted one row into the table. 5 seconds later, the row was deleted.
>> Later when I checked the log infor of 'test' table, I only found the
>> inserted information but not deleted information.
>>
>> Can anyone tell me which information is written into hlog when data is
>> deleted by ttl or in this situation, no information is written into
>> the hlog. If there is no information of deletion in the log, how can
>> we guarantee the data recovered by log are correct?
>>
>> Thanks!
>>
>> Yong
>
>
>
> --
> Harsh J

Re: What happened in hlog if data are deleted cuased by ttl?

Posted by yonghu <yo...@gmail.com>.
And also an interesting point is that the ttl data will not exist in
hfile. I have made the following test,

hbase(main):003:0> create 'test',{TTL=>'200',NAME=>'course'}
0 row(s) in 1.1420 seconds

hbase(main):005:0> put 'test','tom','course:english',90
0 row(s) in 0.0320 seconds

hbase(main):006:0> flush 'test'
0 row(s) in 0.1680 seconds

hbase(main):007:0> scan 'test'
ROW                   COLUMN+CELL
 tom                  column=course:english, timestamp=1345623867082, value=90
1 row(s) in 0.0350 seconds

./hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f
/hbase/test/abe4d5adaa650cdd46d26dca0bf85b72/course/8c77fb321f934592869f9852f777b22e
Scanning -> /hbase/test/abe4d5adaa650cdd46d26dca0bf85b72/course/8c77fb321f934592869f9852f777b22e
12/08/22 10:27:39 INFO hfile.CacheConfig: Allocating LruBlockCache
with maximum size 247.9m
Scanned kv count -> 1

so, I guess the ttl data is only managed in memstore. But the question
is that if memstore doesn't have enough size to accept new incoming
ttl data what will happen? Can anybody explain?

Thanks!

Yong
On Wed, Aug 22, 2012 at 10:19 AM, yonghu <yo...@gmail.com> wrote:
> I can fully understand normal deletion. But, in my point of view, ttl
> deletion is different than the normal deletion. The insertion of ttl
> data is recorded in hlog. But the ttl deletion is not recorded by
> hlog. So, it failure occurs, should the ttl data be reinserted to data
> or can we discard the certain ttl data? Moreover, ttl deletion is not
> executed at data compaction time. Scanner needs to periodically scan
> each Store file to execute deletion.
>
> regards!
>
> Yong
>
>
>
> On Tue, Aug 21, 2012 at 5:29 PM, jmozah <jm...@gmail.com> wrote:
>> This helped me http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html
>>
>>
>> ./Zahoor
>> HBase Musings
>>
>>
>> On 14-Aug-2012, at 6:54 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Hi Yonghu,
>>>
>>> A timestamp is stored along with each insert. The ttl is maintained at
>>> the region-store level. Hence, when the log replays, all entries with
>>> expired TTLs are automatically omitted.
>>>
>>> Also, TTL deletions happen during compactions, and hence do not
>>> carry/need Delete events. When scanning a store file, TTL-expired
>>> entries are automatically skipped away.
>>>
>>> On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com> wrote:
>>>> My hbase version is 0.92. I tried something as follows:
>>>> 1.Created a table 'test' with 'course' in which ttl=5.
>>>> 2. inserted one row into the table. 5 seconds later, the row was deleted.
>>>> Later when I checked the log infor of 'test' table, I only found the
>>>> inserted information but not deleted information.
>>>>
>>>> Can anyone tell me which information is written into hlog when data is
>>>> deleted by ttl or in this situation, no information is written into
>>>> the hlog. If there is no information of deletion in the log, how can
>>>> we guarantee the data recovered by log are correct?
>>>>
>>>> Thanks!
>>>>
>>>> Yong
>>>
>>>
>>>
>>> --
>>> Harsh J
>>

Re: What happened in hlog if data are deleted cuased by ttl?

Posted by yonghu <yo...@gmail.com>.
Sorry for that. I didn't use the right parameter. Now I get the point.

regards!

Yong

On Wed, Aug 22, 2012 at 10:49 AM, Harsh J <ha...@cloudera.com> wrote:
> Hey Yonghu,
>
> You are right that TTL "deletions" (it isn't exactly a delete, its
> more of a compact-time skip wizardry) do not go to the HLog as
> "events". Know that TTLs aren't applied "per-cell", they are applied
> on the whole CF globally. So there is no such thing as a TTL-write or
> a TTL-delete event. In fact, the Region-level Coprocessor too has no
> hooks for "TTL-events", as seen at
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html,
> for this doesn't happen on triggers.
>
> What you say about the compaction part is wrong however. Compaction
> too runs a regular store-file scanner to compact, and so does the
> regular Scan operation, to read (Both use the same file scanning
> mechanism/code). So there's no difference in how compact or a client
> scan handle TTL-expired row values from a store file, when reading it
> up.
>
> I also am not able to understand what your sample shell command list
> shows. As I see it, its shown that the HFile did have the entry in it
> after you had flushed it. Note that you mentioned the TTL at the CF
> level when creating the table, not in your "put" statement, and this
> is a vital point in understanding how TTLs work.
>
> On Wed, Aug 22, 2012 at 1:49 PM, yonghu <yo...@gmail.com> wrote:
>> I can fully understand normal deletion. But, in my point of view, ttl
>> deletion is different than the normal deletion. The insertion of ttl
>> data is recorded in hlog. But the ttl deletion is not recorded by
>> hlog. So, it failure occurs, should the ttl data be reinserted to data
>> or can we discard the certain ttl data? Moreover, ttl deletion is not
>> executed at data compaction time. Scanner needs to periodically scan
>> each Store file to execute deletion.
>>
>> regards!
>>
>> Yong
>>
>>
>>
>> On Tue, Aug 21, 2012 at 5:29 PM, jmozah <jm...@gmail.com> wrote:
>>> This helped me http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html
>>>
>>>
>>> ./Zahoor
>>> HBase Musings
>>>
>>>
>>> On 14-Aug-2012, at 6:54 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> Hi Yonghu,
>>>>
>>>> A timestamp is stored along with each insert. The ttl is maintained at
>>>> the region-store level. Hence, when the log replays, all entries with
>>>> expired TTLs are automatically omitted.
>>>>
>>>> Also, TTL deletions happen during compactions, and hence do not
>>>> carry/need Delete events. When scanning a store file, TTL-expired
>>>> entries are automatically skipped away.
>>>>
>>>> On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com> wrote:
>>>>> My hbase version is 0.92. I tried something as follows:
>>>>> 1.Created a table 'test' with 'course' in which ttl=5.
>>>>> 2. inserted one row into the table. 5 seconds later, the row was deleted.
>>>>> Later when I checked the log infor of 'test' table, I only found the
>>>>> inserted information but not deleted information.
>>>>>
>>>>> Can anyone tell me which information is written into hlog when data is
>>>>> deleted by ttl or in this situation, no information is written into
>>>>> the hlog. If there is no information of deletion in the log, how can
>>>>> we guarantee the data recovered by log are correct?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Yong
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>
>
>
> --
> Harsh J

Re: What happened in hlog if data are deleted cuased by ttl?

Posted by Harsh J <ha...@cloudera.com>.
Hey Yonghu,

You are right that TTL "deletions" (it isn't exactly a delete, its
more of a compact-time skip wizardry) do not go to the HLog as
"events". Know that TTLs aren't applied "per-cell", they are applied
on the whole CF globally. So there is no such thing as a TTL-write or
a TTL-delete event. In fact, the Region-level Coprocessor too has no
hooks for "TTL-events", as seen at
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html,
for this doesn't happen on triggers.

What you say about the compaction part is wrong however. Compaction
too runs a regular store-file scanner to compact, and so does the
regular Scan operation, to read (Both use the same file scanning
mechanism/code). So there's no difference in how compact or a client
scan handle TTL-expired row values from a store file, when reading it
up.

I also am not able to understand what your sample shell command list
shows. As I see it, its shown that the HFile did have the entry in it
after you had flushed it. Note that you mentioned the TTL at the CF
level when creating the table, not in your "put" statement, and this
is a vital point in understanding how TTLs work.

On Wed, Aug 22, 2012 at 1:49 PM, yonghu <yo...@gmail.com> wrote:
> I can fully understand normal deletion. But, in my point of view, ttl
> deletion is different than the normal deletion. The insertion of ttl
> data is recorded in hlog. But the ttl deletion is not recorded by
> hlog. So, it failure occurs, should the ttl data be reinserted to data
> or can we discard the certain ttl data? Moreover, ttl deletion is not
> executed at data compaction time. Scanner needs to periodically scan
> each Store file to execute deletion.
>
> regards!
>
> Yong
>
>
>
> On Tue, Aug 21, 2012 at 5:29 PM, jmozah <jm...@gmail.com> wrote:
>> This helped me http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html
>>
>>
>> ./Zahoor
>> HBase Musings
>>
>>
>> On 14-Aug-2012, at 6:54 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Hi Yonghu,
>>>
>>> A timestamp is stored along with each insert. The ttl is maintained at
>>> the region-store level. Hence, when the log replays, all entries with
>>> expired TTLs are automatically omitted.
>>>
>>> Also, TTL deletions happen during compactions, and hence do not
>>> carry/need Delete events. When scanning a store file, TTL-expired
>>> entries are automatically skipped away.
>>>
>>> On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com> wrote:
>>>> My hbase version is 0.92. I tried something as follows:
>>>> 1.Created a table 'test' with 'course' in which ttl=5.
>>>> 2. inserted one row into the table. 5 seconds later, the row was deleted.
>>>> Later when I checked the log infor of 'test' table, I only found the
>>>> inserted information but not deleted information.
>>>>
>>>> Can anyone tell me which information is written into hlog when data is
>>>> deleted by ttl or in this situation, no information is written into
>>>> the hlog. If there is no information of deletion in the log, how can
>>>> we guarantee the data recovered by log are correct?
>>>>
>>>> Thanks!
>>>>
>>>> Yong
>>>
>>>
>>>
>>> --
>>> Harsh J
>>



-- 
Harsh J

Re: What happened in hlog if data are deleted cuased by ttl?

Posted by yonghu <yo...@gmail.com>.
I can fully understand normal deletion. But, in my point of view, ttl
deletion is different than the normal deletion. The insertion of ttl
data is recorded in hlog. But the ttl deletion is not recorded by
hlog. So, it failure occurs, should the ttl data be reinserted to data
or can we discard the certain ttl data? Moreover, ttl deletion is not
executed at data compaction time. Scanner needs to periodically scan
each Store file to execute deletion.

regards!

Yong



On Tue, Aug 21, 2012 at 5:29 PM, jmozah <jm...@gmail.com> wrote:
> This helped me http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html
>
>
> ./Zahoor
> HBase Musings
>
>
> On 14-Aug-2012, at 6:54 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Hi Yonghu,
>>
>> A timestamp is stored along with each insert. The ttl is maintained at
>> the region-store level. Hence, when the log replays, all entries with
>> expired TTLs are automatically omitted.
>>
>> Also, TTL deletions happen during compactions, and hence do not
>> carry/need Delete events. When scanning a store file, TTL-expired
>> entries are automatically skipped away.
>>
>> On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com> wrote:
>>> My hbase version is 0.92. I tried something as follows:
>>> 1.Created a table 'test' with 'course' in which ttl=5.
>>> 2. inserted one row into the table. 5 seconds later, the row was deleted.
>>> Later when I checked the log infor of 'test' table, I only found the
>>> inserted information but not deleted information.
>>>
>>> Can anyone tell me which information is written into hlog when data is
>>> deleted by ttl or in this situation, no information is written into
>>> the hlog. If there is no information of deletion in the log, how can
>>> we guarantee the data recovered by log are correct?
>>>
>>> Thanks!
>>>
>>> Yong
>>
>>
>>
>> --
>> Harsh J
>

Re: What happened in hlog if data are deleted cuased by ttl?

Posted by jmozah <jm...@gmail.com>.
This helped me http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html


./Zahoor
HBase Musings


On 14-Aug-2012, at 6:54 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Yonghu,
> 
> A timestamp is stored along with each insert. The ttl is maintained at
> the region-store level. Hence, when the log replays, all entries with
> expired TTLs are automatically omitted.
> 
> Also, TTL deletions happen during compactions, and hence do not
> carry/need Delete events. When scanning a store file, TTL-expired
> entries are automatically skipped away.
> 
> On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com> wrote:
>> My hbase version is 0.92. I tried something as follows:
>> 1.Created a table 'test' with 'course' in which ttl=5.
>> 2. inserted one row into the table. 5 seconds later, the row was deleted.
>> Later when I checked the log infor of 'test' table, I only found the
>> inserted information but not deleted information.
>> 
>> Can anyone tell me which information is written into hlog when data is
>> deleted by ttl or in this situation, no information is written into
>> the hlog. If there is no information of deletion in the log, how can
>> we guarantee the data recovered by log are correct?
>> 
>> Thanks!
>> 
>> Yong
> 
> 
> 
> -- 
> Harsh J


Re: What happened in hlog if data are deleted cuased by ttl?

Posted by Harsh J <ha...@cloudera.com>.
Hi Yonghu,

A timestamp is stored along with each insert. The ttl is maintained at
the region-store level. Hence, when the log replays, all entries with
expired TTLs are automatically omitted.

Also, TTL deletions happen during compactions, and hence do not
carry/need Delete events. When scanning a store file, TTL-expired
entries are automatically skipped away.

On Tue, Aug 14, 2012 at 3:34 PM, yonghu <yo...@gmail.com> wrote:
> My hbase version is 0.92. I tried something as follows:
> 1.Created a table 'test' with 'course' in which ttl=5.
> 2. inserted one row into the table. 5 seconds later, the row was deleted.
> Later when I checked the log infor of 'test' table, I only found the
> inserted information but not deleted information.
>
> Can anyone tell me which information is written into hlog when data is
> deleted by ttl or in this situation, no information is written into
> the hlog. If there is no information of deletion in the log, how can
> we guarantee the data recovered by log are correct?
>
> Thanks!
>
> Yong



-- 
Harsh J