You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2014/07/17 01:11:07 UTC

[jira] [Resolved] (HBASE-3099) optimization for log splitting (theory/suggestion)

     [ https://issues.apache.org/jira/browse/HBASE-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell resolved HBASE-3099.
-----------------------------------

    Resolution: Not a Problem

Probably superseded by distributed log splitting

> optimization for log splitting (theory/suggestion)
> --------------------------------------------------
>
>                 Key: HBASE-3099
>                 URL: https://issues.apache.org/jira/browse/HBASE-3099
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>
> Right now log splitting is slower than we'd like.  The slow pace of log splitting is one of the reasons why we have to keep a short, bounded, limit of the outstanding log files.  It would be nice to up that limit, to allow perhaps hundreds of logs.  It would increase efficiency because we would not be force-flushing regions at non-ideal sizes.
> But more data means more to process.  Except that not all of the logs for a regionserver are actually useful.  This is because some regions got flushed before the oldest log was trimmed.  So during log recovery if we read the most recent sequenceid, we could skip, during log splitting (in the master), those entries and avoid writing them to the per-region log recovery.  It would reduce the IO by part, and if our serialization/deser code was clever we might be able to avoid deserializing much.  
> It's not clear how effective or worthwhile this might be.



--
This message was sent by Atlassian JIRA
(v6.2#6252)