You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bing Jiang <ji...@gmail.com> on 2013/05/22 14:39:02 UTC

Risk about RS logs clean ?

Hi,all
I want to know how RS eliminates the unnecessary hlogs.
lastSeqNum stores <RegionName, latest KV Seq id>
and
outputfiles stores <last Seq id before new hlog file, file path>

So, how does rs guarantee that the kv in the hlog to be cleared  have been
already flushed from memstore into hfile.
I have try to read source code to make sense, however, I am not sure
whether it is a source of the risk of data loss.

Thanks.
-- 
Bing Jiang
Tel:(86)134-2619-1361
weibo: http://weibo.com/jiangbinglover
BLOG: http://blog.sina.com.cn/jiangbinglover
National Research Center for Intelligent Computing Systems
Institute of Computing technology
Graduate University of Chinese Academy of Science

Re: Risk about RS logs clean ?

Posted by Sergey Shelukhin <se...@hortonworks.com>.
IIRC the version in previous branches should have an epic lock somewhere
(cacheFlushLock or something like that) that should make this map
manipulations safe also.

On Wed, May 22, 2013 at 6:27 PM, Bing Jiang <ji...@gmail.com>wrote:

> Hi,Sergey.
> The version of hbase in our environment is 0.94.3, and the FSHLog.java
> comes from 0.95 or version above.
> And it adds such codes in FSHLog::cleanOldLogs,
>  long oldestOutstandingSeqNum = Long.MAX_VALUE;
>     synchronized (oldestSeqNumsLock) {
>       Long oldestFlushing = (oldestFlushingSeqNums.size() > 0)
>         ? Collections.min(oldestFlushingSeqNums.values()) : Long.MAX_VALUE
> ;
>       Long oldestUnflushed = (oldestUnflushedSeqNums.size() > 0)
>         ? Collections.min(oldestUnflushedSeqNums.values()) : Long.
> MAX_VALUE;
>       oldestOutstandingSeqNum = Math.min(oldestFlushing, oldestUnflushed);
>     }
>
> Which is different from the function from 0.94.3.
>
>  private byte [][] cleanOldLogs() throws IOException {
>       Long oldestOutstandingSeqNum = getOldestOutstandingSeqNum();
>   ...
>   }
>  private Long getOldestOutstandingSeqNum() {
>     return Collections.min(this.lastSeqWritten.values());
>   }
>
> And I think the version in trunk is safe.
>
> Thanks for Sergey.
>
>
> 2013/5/23 Sergey Shelukhin <se...@hortonworks.com>
>
>> FSHLog (in trunk) stores the earliest seqnums for each region in current
>> memstore, and earliest flushing seqnum (see
>> FSHLog::start/complete/abortCacheFlush). When logs are deleted the logs
>> with seqnums that are above the earliest flushing/flushed seqnum for any
>> region are not deleted (see FSHLog::cleanOldLogs).
>>
>> On Wed, May 22, 2013 at 5:39 AM, Bing Jiang <jiangbinglover@gmail.com
>> >wrote:
>>
>> > Hi,all
>> > I want to know how RS eliminates the unnecessary hlogs.
>> > lastSeqNum stores <RegionName, latest KV Seq id>
>> > and
>> > outputfiles stores <last Seq id before new hlog file, file path>
>> >
>> > So, how does rs guarantee that the kv in the hlog to be cleared  have
>> been
>> > already flushed from memstore into hfile.
>> > I have try to read source code to make sense, however, I am not sure
>> > whether it is a source of the risk of data loss.
>> >
>> > Thanks.
>> > --
>> > Bing Jiang
>> > Tel:(86)134-2619-1361
>> > weibo: http://weibo.com/jiangbinglover
>> > BLOG: http://blog.sina.com.cn/jiangbinglover
>> > National Research Center for Intelligent Computing Systems
>> > Institute of Computing technology
>> > Graduate University of Chinese Academy of Science
>> >
>>
>
>
>
> --
> Bing Jiang
> Tel:(86)134-2619-1361
> weibo: http://weibo.com/jiangbinglover
> BLOG: http://blog.sina.com.cn/jiangbinglover
> National Research Center for Intelligent Computing Systems
> Institute of Computing technology
> Graduate University of Chinese Academy of Science
>

Re: Risk about RS logs clean ?

Posted by Bing Jiang <ji...@gmail.com>.
Hi,Sergey.
The version of hbase in our environment is 0.94.3, and the FSHLog.java
comes from 0.95 or version above.
And it adds such codes in FSHLog::cleanOldLogs,
 long oldestOutstandingSeqNum = Long.MAX_VALUE;
    synchronized (oldestSeqNumsLock) {
      Long oldestFlushing = (oldestFlushingSeqNums.size() > 0)
        ? Collections.min(oldestFlushingSeqNums.values()) : Long.MAX_VALUE;
      Long oldestUnflushed = (oldestUnflushedSeqNums.size() > 0)
        ? Collections.min(oldestUnflushedSeqNums.values()) : Long.MAX_VALUE;
      oldestOutstandingSeqNum = Math.min(oldestFlushing, oldestUnflushed);
    }

Which is different from the function from 0.94.3.

 private byte [][] cleanOldLogs() throws IOException {
      Long oldestOutstandingSeqNum = getOldestOutstandingSeqNum();
  ...
  }
 private Long getOldestOutstandingSeqNum() {
    return Collections.min(this.lastSeqWritten.values());
  }

And I think the version in trunk is safe.

Thanks for Sergey.


2013/5/23 Sergey Shelukhin <se...@hortonworks.com>

> FSHLog (in trunk) stores the earliest seqnums for each region in current
> memstore, and earliest flushing seqnum (see
> FSHLog::start/complete/abortCacheFlush). When logs are deleted the logs
> with seqnums that are above the earliest flushing/flushed seqnum for any
> region are not deleted (see FSHLog::cleanOldLogs).
>
> On Wed, May 22, 2013 at 5:39 AM, Bing Jiang <jiangbinglover@gmail.com
> >wrote:
>
> > Hi,all
> > I want to know how RS eliminates the unnecessary hlogs.
> > lastSeqNum stores <RegionName, latest KV Seq id>
> > and
> > outputfiles stores <last Seq id before new hlog file, file path>
> >
> > So, how does rs guarantee that the kv in the hlog to be cleared  have
> been
> > already flushed from memstore into hfile.
> > I have try to read source code to make sense, however, I am not sure
> > whether it is a source of the risk of data loss.
> >
> > Thanks.
> > --
> > Bing Jiang
> > Tel:(86)134-2619-1361
> > weibo: http://weibo.com/jiangbinglover
> > BLOG: http://blog.sina.com.cn/jiangbinglover
> > National Research Center for Intelligent Computing Systems
> > Institute of Computing technology
> > Graduate University of Chinese Academy of Science
> >
>



-- 
Bing Jiang
Tel:(86)134-2619-1361
weibo: http://weibo.com/jiangbinglover
BLOG: http://blog.sina.com.cn/jiangbinglover
National Research Center for Intelligent Computing Systems
Institute of Computing technology
Graduate University of Chinese Academy of Science

Re: Risk about RS logs clean ?

Posted by Sergey Shelukhin <se...@hortonworks.com>.
FSHLog (in trunk) stores the earliest seqnums for each region in current
memstore, and earliest flushing seqnum (see
FSHLog::start/complete/abortCacheFlush). When logs are deleted the logs
with seqnums that are above the earliest flushing/flushed seqnum for any
region are not deleted (see FSHLog::cleanOldLogs).

On Wed, May 22, 2013 at 5:39 AM, Bing Jiang <ji...@gmail.com>wrote:

> Hi,all
> I want to know how RS eliminates the unnecessary hlogs.
> lastSeqNum stores <RegionName, latest KV Seq id>
> and
> outputfiles stores <last Seq id before new hlog file, file path>
>
> So, how does rs guarantee that the kv in the hlog to be cleared  have been
> already flushed from memstore into hfile.
> I have try to read source code to make sense, however, I am not sure
> whether it is a source of the risk of data loss.
>
> Thanks.
> --
> Bing Jiang
> Tel:(86)134-2619-1361
> weibo: http://weibo.com/jiangbinglover
> BLOG: http://blog.sina.com.cn/jiangbinglover
> National Research Center for Intelligent Computing Systems
> Institute of Computing technology
> Graduate University of Chinese Academy of Science
>