You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by alakshman <gh...@gmail.com> on 2007/07/14 21:49:31 UTC

Discarding HLog files

I had a question about how stuff is being written to the HLogs. Each column
family that makes up a table has its own on disk representation. However
there is only one HLog for all tables. Which means on every write, the
individual HMemcache's for each column family in the row mutation are
updated but the entire row is written to the HLog. 

Now when a column family's HMemcache is flushed a token is written to HLog
indicating that the column family for this table has been flushed ? There
may be other column families which have not yet been flushed. Since we seem
to write the entire rows to the HLog how can one tell that the log file has
only flushed entities w/o a scan of the entire file ? Is the sequential scan
unavoidable to determine if the HLog can be deleted when it is rolled away ?

Please explain.

THanks
Avinash
-- 
View this message in context: http://www.nabble.com/Discarding-HLog-files-tf4080078.html#a11596742
Sent from the Hadoop Dev mailing list archive at Nabble.com.

Re: Discarding HLog files

Posted by Jim Kellerman <ji...@powerset.com>.

On Sat, 2007-07-14 at 12:49 -0700, alakshman wrote:
> I had a question about how stuff is being written to the HLogs. Each column
> family that makes up a table has its own on disk representation. However
> there is only one HLog for all tables. 

This isn't quite true. There is one HLog per HRegionServer.

> Which means on every write, the
> individual HMemcache's for each column family in the row mutation are
> updated but the entire row is written to the HLog. 

Also not quite true. The entire row is not written, only the changes are
written to the HLog.

> Now when a column family's HMemcache is flushed a token is written to HLog
> indicating that the column family for this table has been flushed ? There
> may be other column families which have not yet been flushed. Since we seem
> to write the entire rows to the HLog how can one tell that the log file has
> only flushed entities w/o a scan of the entire file ? 

When the memcache is flushed, it happens on a per-region basis. That is
all the changes that apply to that region (all changed columns) are
written to disk. After the changes are flushed, a flushcache-complete is
written to the log indicating that all changes older than this id can be
ignored.

HLog maintains a couple of in-memory structures indicating for each
region, what the last flushed sequence number is, and also has a map of
flush id's to output files.

When the log is rolled, it determines the oldest outstanding sequence
number (the oldest sequence number that has not been flushed) and knows
that it can discard all the files with sequence numbers older than the
oldest outstanding change.

If a region server crashes, the master determines which regions the
region server was serving and has the hlog split into a separate part
for each region, and leaves the hlog in a special location. When the
master reassigns the region, part of starting up a region includes
processing any log entries that were not flushed (HRegion looks for an
old log file in the special location). Once the outstanding log entries
have been processed, the region can be brought on line.

> Is the sequential scan
> unavoidable to determine if the HLog can be deleted when it is rolled away ?
> 
> Please explain.
> 
> THanks
> Avinash
-- 
Jim Kellerman, Senior Engineer; Powerset
jim@powerset.com