You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Erdem Agaoglu <er...@gmail.com> on 2010/11/01 09:36:16 UTC

Re: Node failure causes weird META data?

Hi again,

I have re-checked our configuration to confirm that we have
dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in logs. I
inspected logs around log splits to find something, but i'm not sure that's
what we are looking for. In the first step of the scenario i mentioned
before (where we kill -9ed everything on the node that hosts the ROOT
region), HLog says (stripping hdfs:// prefixes and hostnames for clarity)

# Splitting 7 hlog(s) in .logs/F,60020,1287491528908

Then it goes over every single one like

# Splitting hlog 1 of 7
# Splitting hlog 2 of 7
# ...
# Splitting hlog 7 of 7

On the 7th hlog it WARNs with two lines

# File .logs/F,60020,1287491528908/10.1.10.229%3A60020.1288021443546 might
be still open, length is 0
# Could not open .logs/F,60020,1287491528908/10.1.10.229%3A60020.1288021443546
for reading. File is emptyjava.io.EOFException

And completes with

# log file splitting completed in 80372 millis for
.logs/F,60020,1287491528908

This might be it, but on the sixth step (where we kill -9ed the RegionServer
that hosts the only META region), it splits 2 hlogs without any empty file
problems nor any log above INFO, but as i told before, our testtable still
got lost.

I'll try to reproduce the problem in a cleaner way, but in the meantime, any
kind of pointers to any problems we might have is greatly appreciated.


On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <er...@gmail.com>wrote:

> Thanks for the answer.
>
> I am pretty sure we have dfs.support.append enabled. I remember both the
> conf file and the logs, and don't recall seeing any errors on 60010. I
> crawled through logs all yesterday but don't remember anything indicating a
> specific error too. But i'm not sure about that. Let me check that and get
> back here on monday.
>
> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> First thing I'd check is if your configuration has dfs.support.append,
>> you can confirm this by looking at your region server logs. When a RS
>> starts, it creates an HLog and will print out: "Using syncFs --
>> HDFS-200" if it's configured, else you'll see "syncFs -- HDFS-200 --
>> not available, dfs.support.append=false". Also the master web ui (on
>> port 60010) will print an error message regarding that.
>>
>> If it's all ok, then you should take a look at the master log when it
>> does the log splitting and see if it contains any obvious errors.
>>
>> J-D
>>
>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <er...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > We have a testing cluster of 6 nodes which we try to run an
>> HBase/MapReduce
>> > application on. In order to simulate a power failure we kill -9ed all
>> things
>> > hadoop related on one of the slave nodes (DataNode, RegionServer,
>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was on this
>> node
>> > too). We were expecting a smooth transition on all services but were
>> unable
>> > to get on HBase end. While our regions seemed intact (not confirmed), we
>> > lost table definitions that pointed some kind of META region fail. So
>> our
>> > application failed with several TableNotFoundExceptions. Simulation was
>> > conducted with no-load and extremely small data (like 10 rows in 3
>> tables).
>> >
>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
>> > runs 0.20.3-append-r964955-1240, r960957. Most of the configuration
>> > parameters are in default.
>> >
>> > If we did something wrong up to this point, please ignore the rest of
>> the
>> > message as i'll try to explain what we did to reproduce it and might be
>> > irrelevant.
>> >
>> > Say the machines are named A, B, C, D, E, F; where A is master-like
>> node,
>> > others are slaves and power fail is on F. Since we have little data, we
>> have
>> > one ROOT and only one META region. I'll try to sum up the whole
>> scenario.
>> >
>> > A: NN, DN, JT, TT, HM, RS
>> > B: DN, TT, RS, ZK
>> > C: DN, TT, RS, ZK
>> > D: DN, TT, RS, ZK
>> > E: DN, TT, RS, ZK
>> > F: SNN, DN, TT, RS, ZK
>> >
>> > 0. Initial state -> ROOT: F, META: A
>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited for about
>> half
>> > an hour to get nothing BTW
>> > 2. Put F back online -> No effect
>> > 3. Create a table 'testtable' to see if we lose it
>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
>> 'testtable'
>> > but get our tables from before the simulation. It seemed like because A
>> had
>> > META before the simulation, the table definitions were revived.
>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost 2 out of
>> our
>> > original 6 tables, 'testtable' revived. That small data seems corrupted
>> too
>> > as our Scans don't finish.
>> > 8. Run to mailing-list.
>> >
>> > First of all thanks for reading up to this point. From what we are now,
>> we
>> > are not even sure if this is the expected behavior, like if ROOT or META
>> > region dies we lose data and must do sth like hbck, or if we are missing
>> a
>> > configuration, or if this is a bug. No need to mention that we are
>> > relatively new to HBase so the last possibility is that if we didn't
>> > understand it at all.
>> >
>> > Thanks in advance for any ideas.
>> >
>> > --
>> > erdem agaoglu
>> >
>>
>
>
>
> --
> erdem agaoglu
>



-- 
erdem agaoglu

Re: Node failure causes weird META data?

Posted by Erdem Agaoglu <er...@gmail.com>.
We've applied and it works great. Our hlogs seem intact after rolls, and we
were unable to crash the cluster by kill -9ing any RS.

Thanks for your interest and rapid response.

On Sat, Nov 6, 2010 at 12:51 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> I resolved the problem in trunk, a patch for 0.89 can also be found
> attached in the jira. It should apply cleanly. Let me know if it works
> better for you!
>
> Thx,
>
> J-D
>
> On Fri, Nov 5, 2010 at 11:45 AM, Erdem Agaoglu <er...@gmail.com>
> wrote:
> > Previous problems probably correlate. When log split and replay fails
> > [because of the empty log], master thinks the cluster is in a clean state
> > and assigns a new .META. on some regionserver. It does not know about the
> > previous assignment of .META. thus does not split any logs for it. When
> > regionserver opens new .META., it looks to hdfs and finds some persistent
> > file for it, and since there is no recovered.edits it starts to serve.
> > Depending on the state those files 2 things happen:
> >
> > 1. If the cluster is really new (like an hour new) all the .META. data
> will
> > be on previous RS memstore. So there will be no accessible regions = lost
> > tables
> > 2. If the cluster is running for a while or gracefully restarted before,
> the
> > .META. data from the previous run would be persisted. So the regions from
> > this previous run will be accessible but not the newly generated or
> > re-assigned regions = unfinishing scans or corrupt data
> > *. No matter what, since master thinks the .META. is new, it will not
> know
> > about and unassign original regions = double assignment
> >
> > So if we are right about those assumptions, it seems it's the same
> problem.
> >
> > We've learned a lot about HBase this week, and we owe it to your
> pointers,
> > thanks.
> > I am grateful have the chance to contribute to the community in any
> way,...
> > at last :).
> >
> > On Fri, Nov 5, 2010 at 7:43 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >
> >> Very interesting finding Erdem! I tried it on a local empty instance
> >> and I set the roll period to 6 secs, here's what I saw:
> >>
> >> org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 6000ms
> >> elapsed
> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
> >> syncFs -- HDFS-200
> >> org.apache.hadoop.hbase.regionserver.wal.HLog: Roll
> >> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177
> >> %3A60020.1288977933829,
> >> entries=1, filesize=295. New hlog
> >> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177
> >> %3A60020.1288977943913
> >> org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs to remove
> >>  out of total 1; oldest outstanding sequenceid is 270055 from region
> >> -ROOT-,,0
> >> org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file
> >> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177
> >> %3A60020.1288977933829
> >> whose highest sequenceid is 270054 to
> >> /hbase-89-su/.oldlogs/10.10.1.177%3A60020.1288977933829
> >>
> >> This looks like a very bad bug, I opened
> >> https://issues.apache.org/jira/browse/HBASE-3198
> >>
> >> Can you correlate this with the previous tests you did? Hopefully it's
> >> the same problem.
> >>
> >> Thanks a lot for your patience and investigative work!
> >>
> >> J-D
> >>
> >> On Fri, Nov 5, 2010 at 9:28 AM, Erdem Agaoglu <er...@gmail.com>
> >> wrote:
> >> > We have found the problem, again. But this time, we dug deeper and
> >> extracted
> >> > more info.
> >> >
> >> > First, the reason that we were unable to reproduce was trying to kill
> the
> >> RS
> >> > that holds -ROOT- too soon. That way,
> >> > 1. log split goes fine,
> >> > 2. -ROOT- gets assigned on somewhere else,
> >> > 3. log replaying for that RS goes smoothly
> >> > 4. .META. does not change.
> >> > 5. everything is normal
> >> >
> >> > But if we try after a while of running the cluster,
> >> > 1. log split fails with java.io.EOFException for the only hlog file,
> >> > 2. -ROOT- gets assigned on somewhere else,
> >> > 3. no log is replayed since none split
> >> > 4. .META. is invalid, a new .META. is designated
> >> > 5. data corrupted
> >> >
> >> > Before killing RS that holds -ROOT-, we checked the only hlog file of
> it,
> >> > and see that it contains the updates for the place of the .META. in
> the
> >> > first scenario. But in the second failing scenario, the file was
> empty.
> >> We
> >> > found that the log rolling process, which happens once an hour by
> >> default,
> >> > somehow thinks that the hlog is old and cleans it up.
> >> >
> >> > # LogRoller: Hlog roll period 3600000ms elapsed
> >> > # SequenceFileLogWriter: Using syncFs -- HDFS-200
> >> > # HLog: Roll /hbase/.logs/A/blahfile0316, entries=1, filesize=312. New
> >> hlog
> >> > /hbase/.logs/A/blahfile0476
> >> > # HLog: Found 1 hlogs to remove  out of total 1; oldest outstanding
> >> > sequenceid is 4 from region -ROOT-,,0
> >> > # HLog: moving old hlog file /hbase/.logs/A/blahfile0316 whose highest
> >> > sequenceid is 3 to /hbase/.oldlogs/blahfile0316
> >> >
> >> > In that instance, blahfile0316 contains some rows while blahfile0476
> is
> >> > empty. It very much seems like the problem is here, as it thinks the
> data
> >> in
> >> > memstore to be persisted while it is not. This seemed like a bug to us
> >> but
> >> > might as well be a misconfiguration. Any idea is greatly appreciated.
> >> >
> >> >
> >> > On Wed, Nov 3, 2010 at 4:36 PM, Erdem Agaoglu <
> erdem.agaoglu@gmail.com
> >> >wrote:
> >> >
> >> >> We suspect some misconfiguration too but are unsure what that might
> be.
> >> >> Almost all of the configuration parameters are in default along with
> >> >> dfs.support.append which is true in append branch. We checked RS logs
> >> and
> >> >> couldn't find any Exceptions, everything seems normal so it is
> unlikely
> >> to
> >> >> be a bug. The only WARN was about the last hlog file being empty but
> i
> >> guess
> >> >> this is normal.
> >> >>
> >> >> To confirm it is a append-configuration related problem, we killed
> the
> >> RS
> >> >> that holds -ROOT- right after running the cluster and checked the
> >> >> /hbase/.oldlogs/. There was a file containing a line for -ROOT- so i
> >> think
> >> >> append works correctly. The file got removed after a while.
> >> >>
> >> >> BTW, we tried to reproduce it with smaller data (to control manually,
> >> like
> >> >> 2 rows in 1 table, newly put) but didn't have any problems. I am
> >> starting to
> >> >> think the problem just disappeared. I'll increase rows to hundreds
> and
> >> check
> >> >> again.
> >> >>
> >> >> Until i get the problem again, Thanks for your time.
> >> >>
> >> >> On Tue, Nov 2, 2010 at 7:28 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >> >>
> >> >>> This confirms my suspicion that -ROOT-'s latest data was lost and
> that
> >> >>> .META. got double assigned, which easily explains all the weirdness
> >> >>> you're seeing after the log replay.
> >> >>>
> >> >>> The most probable reason would be misconfiguration, since the append
> >> >>> feature is very well tested and a simple kill -9 wouldn't cause all
> >> >>> sorts of issues. Else, it could be caused by an HBase bug, in which
> >> >>> case exceptions would probably come out either during log splitting
> or
> >> >>> during log replay (which happens when the region is reassigned to
> >> >>> another region server). In your case you could take a look at the RS
> >> >>> log where -ROOT- ends up and see if there's anything weird.
> >> >>>
> >> >>> To test if appends works correctly, in order to rule that out, just
> >> >>> bring up your cluster and kill -9 right away the RS that's holding
> >> >>> ROOT. Once the master did the log splitting, do a lookup directly in
> >> >>> hadoop (bin/hadoop fs -cat) on that log file which should be under
> >> >>> /hbase/.oldlogs/serverstring/somelog and look for that -ROOT- entry.
> >> >>> If it's empty or missing, it's probably a misconfiguration.
> >> >>>
> >> >>> J-D
> >> >>>
> >> >>> On Tue, Nov 2, 2010 at 9:02 AM, Erdem Agaoglu <
> erdem.agaoglu@gmail.com
> >> >
> >> >>> wrote:
> >> >>> > We reproduced the problem by kill -9ing the region server that
> hosts
> >> >>> -ROOT-
> >> >>> > again. Results are the same with what i have explained before.
> After
> >> log
> >> >>> > splitting metaScanner complained that .META. is not valid and
> >> reassigned
> >> >>> it
> >> >>> > to somewhere else. As a result, the number of rows in .META.
> reduced
> >> >>> from 24
> >> >>> > to 16 meaning we lost access to some regions and some table
> >> definitions.
> >> >>> To
> >> >>> > support what you have said about the data in the memstore, we seem
> to
> >> >>> have
> >> >>> > lost only recent regions.
> >> >>> >
> >> >>> > status 'detailed' showed no regionsInTransition, and showed a
> total
> >> of
> >> >>> 39
> >> >>> > regions assigned to region servers, but as i've said before, we
> have
> >> >>> access
> >> >>> > to only 16 of them. No need to mention our queries fail.
> >> >>> >
> >> >>> > Any idea we could try is greatly appreciated. Thanks in advance.
> >> >>> >
> >> >>> > PS. Something probably irrelevant happened this time, interface
> 60010
> >> >>> lists
> >> >>> > all our tables while 'list' command in shell lists only a few. We
> get
> >> >>> > TableNotFoundExceptions on tables shown in web interface but
> missing
> >> in
> >> >>> > 'list' command output. Seems like web interface is able to list
> >> >>> inaccessible
> >> >>> > tables.
> >> >>> >
> >> >>> > On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu <
> >> erdem.agaoglu@gmail.com
> >> >>> >wrote:
> >> >>> >
> >> >>> >> By "lost tables" i mean no user table definitions were listed in
> >> >>> interface
> >> >>> >> 60010 and my queries got me TableNotFoundExceptions. Routine
> >> >>> BaseScanner
> >> >>> >> logged like
> >> >>> >>
> >> >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region ...
> >> >>> complete
> >> >>> >>
> >> >>> >> so i guess my .META. was empty. But unfortunately I don't know if
> >> any
> >> >>> >> regions were stuck in transition, i'll be sure to check this out
> >> while
> >> >>> i try
> >> >>> >> to reproduce.
> >> >>> >>
> >> >>> >> I rechecked the logs, specifically after the splitting completed
> and
> >> it
> >> >>> >> contains lines like "Current assignment is not valid..." so i
> guess
> >> >>> this is
> >> >>> >> something unexpected. But in hopes of some configuration error
> you
> >> may
> >> >>> spot,
> >> >>> >> whole log goes like that (stripped to be more readable):
> >> >>> >>
> >> >>> >> # hlog file splitting completed in 80372 millis
> >> >>> >> # Log split complete, meta reassignment and scanning:
> >> >>> >> # ProcessServerShutdown reassigning ROOT region
> >> >>> >> # -ROOT- region unset (but not set to be reassigned)
> >> >>> >> # ROOT inserted into regionsInTransition
> >> >>> >> # Assigning for serverName=C...  total nregions to assign=1,
> regions
> >> to
> >> >>> >> give other servers than this=0, isMetaAssign=true
> >> >>> >> # Assigning serverName=C... 1 regions
> >> >>> >> # Assigning region -ROOT-... to C...
> >> >>> >> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1
> >> >>> >> # RegionManager.rootScanner scanning meta region {server: C ...
> >> >>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
> >> >>> >> startCode=blah unknown; checking once more!
> >> >>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
> >> >>> >> startCode=blah unknown.
> >> >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region
> {server:
> >> C
> >> >>> ...
> >> >>> >> complete
> >> >>> >> # RegionManager.rootScanner scanning meta region {server: C ...
> >> >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region
> {server:
> >> C
> >> >>> ...
> >> >>> >> complete
> >> >>> >> # Assigning for serverName=E...  total nregions to assign=-1,
> >> regions
> >> >>> to
> >> >>> >> give other servers than this=2, isMetaAssign=true
> >> >>> >> # Assigning serverName=E... -1 regions
> >> >>> >> # Assigning region .META.... to E...
> >> >>> >> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1
> >> >>> >> # Processing todo: PendingOpenOperation from E...
> >> >>> >> # .META.... open on E
> >> >>> >> # Updated row .META.... in region -ROOT- ...
> >> >>> >> # Adding to onlineMetaRegions: {server: E ...
> >> >>> >> # RegionManager.metaScanner scanning meta region {server: E ...
> >> >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region
> {server:
> >> E
> >> >>> ...
> >> >>> >> # All 1 .META. region(s) scanned
> >> >>> >> 10 secs later
> >> >>> >> # Processing todo: ProcessServerShutdown of F
> >> >>> >> # Process shutdown of server F...: logSplit: true, rootRescanned:
> >> >>> false,
> >> >>> >> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> >> >>> >> # Log split complete, meta reassignment and scanning
> >> >>> >> # Process server shutdown scanning root region on C
> >> >>> >> # Process server shutdown scanning root region on C finished
> master
> >> >>> >> # process server shutdown scanning .META.,,1 on E
> >> >>> >> # process server shutdown finished scanning .META.,,1 on E
> >> >>> >> # Removed F... from deadservers Map
> >> >>> >>
> >> >>> >> Thanks again.
> >> >>> >>
> >> >>> >> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans <
> >> >>> jdcryans@apache.org>wrote:
> >> >>> >>
> >> >>> >>> The fact that the tables are "revived" is a clue here IMO, but
> >> let's
> >> >>> >>> go back to more basic questions...
> >> >>> >>>
> >> >>> >>> So when you say that during step 1 you lost tables, what do you
> >> mean
> >> >>> >>> by "lost"? Were the rows of those tables still in .META.? Were
> the
> >> >>> >>> regions stuck in transition (in the shell, do: status
> 'detailed')?
> >> Or
> >> >>> >>> when you tried to query them you just got a
> TableNotFoundException?
> >> >>> >>>
> >> >>> >>> Also, the fact that only -ROOT- and not .META. was on this
> region
> >> >>> >>> server means that if there was any data lost, it would be
> .META.'s
> >> >>> >>> location and it would have been assigned somewhere else (E), but
> >> still
> >> >>> >>> stayed assigned on A. Since the data is in the memstore, recent
> >> data
> >> >>> >>> couldn't be read by this second assignment of .META. but... it
> >> could
> >> >>> >>> also be reassigned for a "normal" reason like rebalancing. The
> best
> >> >>> >>> way to confirm that is when the -ROOT- region gets reassigned at
> >> the
> >> >>> >>> end of step 1 (so this is after the message that goes like
> "...file
> >> >>> >>> splitting completed in 80372..."), do so see something like this
> in
> >> >>> >>> the master's log: "Current assignment of .META.,,some_timestamp
>  is
> >> >>> >>> not valid; serverAddress=blah, startcode=blah unknown."? If so,
> >> then
> >> >>> >>> it seems that data was lost and this is really unexpected.
> >> >>> >>>
> >> >>> >>> J-D
> >> >>> >>>
> >> >>> >>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <
> >> >>> erdem.agaoglu@gmail.com>
> >> >>> >>> wrote:
> >> >>> >>> > Hi again,
> >> >>> >>> >
> >> >>> >>> > I have re-checked our configuration to confirm that we have
> >> >>> >>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200"
> in
> >> >>> logs. I
> >> >>> >>> > inspected logs around log splits to find something, but i'm
> not
> >> sure
> >> >>> >>> that's
> >> >>> >>> > what we are looking for. In the first step of the scenario i
> >> >>> mentioned
> >> >>> >>> > before (where we kill -9ed everything on the node that hosts
> the
> >> >>> ROOT
> >> >>> >>> > region), HLog says (stripping hdfs:// prefixes and hostnames
> for
> >> >>> >>> clarity)
> >> >>> >>> >
> >> >>> >>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
> >> >>> >>> >
> >> >>> >>> > Then it goes over every single one like
> >> >>> >>> >
> >> >>> >>> > # Splitting hlog 1 of 7
> >> >>> >>> > # Splitting hlog 2 of 7
> >> >>> >>> > # ...
> >> >>> >>> > # Splitting hlog 7 of 7
> >> >>> >>> >
> >> >>> >>> > On the 7th hlog it WARNs with two lines
> >> >>> >>> >
> >> >>> >>> > # File .logs/F,60020,1287491528908/10.1.10.229
> >> >>> %3A60020.1288021443546
> >> >>> >>> might
> >> >>> >>> > be still open, length is 0
> >> >>> >>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229
> >> >>> >>> %3A60020.1288021443546
> >> >>> >>> > for reading. File is emptyjava.io.EOFException
> >> >>> >>> >
> >> >>> >>> > And completes with
> >> >>> >>> >
> >> >>> >>> > # log file splitting completed in 80372 millis for
> >> >>> >>> > .logs/F,60020,1287491528908
> >> >>> >>> >
> >> >>> >>> > This might be it, but on the sixth step (where we kill -9ed
> the
> >> >>> >>> RegionServer
> >> >>> >>> > that hosts the only META region), it splits 2 hlogs without
> any
> >> >>> empty
> >> >>> >>> file
> >> >>> >>> > problems nor any log above INFO, but as i told before, our
> >> testtable
> >> >>> >>> still
> >> >>> >>> > got lost.
> >> >>> >>> >
> >> >>> >>> > I'll try to reproduce the problem in a cleaner way, but in the
> >> >>> meantime,
> >> >>> >>> any
> >> >>> >>> > kind of pointers to any problems we might have is greatly
> >> >>> appreciated.
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <
> >> >>> erdem.agaoglu@gmail.com
> >> >>> >>> >wrote:
> >> >>> >>> >
> >> >>> >>> >> Thanks for the answer.
> >> >>> >>> >>
> >> >>> >>> >> I am pretty sure we have dfs.support.append enabled. I
> remember
> >> >>> both
> >> >>> >>> the
> >> >>> >>> >> conf file and the logs, and don't recall seeing any errors on
> >> >>> 60010. I
> >> >>> >>> >> crawled through logs all yesterday but don't remember
> anything
> >> >>> >>> indicating a
> >> >>> >>> >> specific error too. But i'm not sure about that. Let me check
> >> that
> >> >>> and
> >> >>> >>> get
> >> >>> >>> >> back here on monday.
> >> >>> >>> >>
> >> >>> >>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <
> >> >>> >>> jdcryans@apache.org>wrote:
> >> >>> >>> >>
> >> >>> >>> >>> First thing I'd check is if your configuration has
> >> >>> dfs.support.append,
> >> >>> >>> >>> you can confirm this by looking at your region server logs.
> >> When a
> >> >>> RS
> >> >>> >>> >>> starts, it creates an HLog and will print out: "Using syncFs
> --
> >> >>> >>> >>> HDFS-200" if it's configured, else you'll see "syncFs --
> >> HDFS-200
> >> >>> --
> >> >>> >>> >>> not available, dfs.support.append=false". Also the master
> web
> >> ui
> >> >>> (on
> >> >>> >>> >>> port 60010) will print an error message regarding that.
> >> >>> >>> >>>
> >> >>> >>> >>> If it's all ok, then you should take a look at the master
> log
> >> when
> >> >>> it
> >> >>> >>> >>> does the log splitting and see if it contains any obvious
> >> errors.
> >> >>> >>> >>>
> >> >>> >>> >>> J-D
> >> >>> >>> >>>
> >> >>> >>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <
> >> >>> >>> erdem.agaoglu@gmail.com>
> >> >>> >>> >>> wrote:
> >> >>> >>> >>> > Hi all,
> >> >>> >>> >>> >
> >> >>> >>> >>> > We have a testing cluster of 6 nodes which we try to run
> an
> >> >>> >>> >>> HBase/MapReduce
> >> >>> >>> >>> > application on. In order to simulate a power failure we
> kill
> >> >>> -9ed
> >> >>> >>> all
> >> >>> >>> >>> things
> >> >>> >>> >>> > hadoop related on one of the slave nodes (DataNode,
> >> >>> RegionServer,
> >> >>> >>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode
> was
> >> on
> >> >>> >>> this
> >> >>> >>> >>> node
> >> >>> >>> >>> > too). We were expecting a smooth transition on all
> services
> >> but
> >> >>> were
> >> >>> >>> >>> unable
> >> >>> >>> >>> > to get on HBase end. While our regions seemed intact (not
> >> >>> >>> confirmed), we
> >> >>> >>> >>> > lost table definitions that pointed some kind of META
> region
> >> >>> fail.
> >> >>> >>> So
> >> >>> >>> >>> our
> >> >>> >>> >>> > application failed with several TableNotFoundExceptions.
> >> >>> Simulation
> >> >>> >>> was
> >> >>> >>> >>> > conducted with no-load and extremely small data (like 10
> rows
> >> in
> >> >>> 3
> >> >>> >>> >>> tables).
> >> >>> >>> >>> >
> >> >>> >>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while
> Hadoop
> >> >>> >>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the
> >> >>> configuration
> >> >>> >>> >>> > parameters are in default.
> >> >>> >>> >>> >
> >> >>> >>> >>> > If we did something wrong up to this point, please ignore
> the
> >> >>> rest
> >> >>> >>> of
> >> >>> >>> >>> the
> >> >>> >>> >>> > message as i'll try to explain what we did to reproduce it
> >> and
> >> >>> might
> >> >>> >>> be
> >> >>> >>> >>> > irrelevant.
> >> >>> >>> >>> >
> >> >>> >>> >>> > Say the machines are named A, B, C, D, E, F; where A is
> >> >>> master-like
> >> >>> >>> >>> node,
> >> >>> >>> >>> > others are slaves and power fail is on F. Since we have
> >> little
> >> >>> data,
> >> >>> >>> we
> >> >>> >>> >>> have
> >> >>> >>> >>> > one ROOT and only one META region. I'll try to sum up the
> >> whole
> >> >>> >>> >>> scenario.
> >> >>> >>> >>> >
> >> >>> >>> >>> > A: NN, DN, JT, TT, HM, RS
> >> >>> >>> >>> > B: DN, TT, RS, ZK
> >> >>> >>> >>> > C: DN, TT, RS, ZK
> >> >>> >>> >>> > D: DN, TT, RS, ZK
> >> >>> >>> >>> > E: DN, TT, RS, ZK
> >> >>> >>> >>> > F: SNN, DN, TT, RS, ZK
> >> >>> >>> >>> >
> >> >>> >>> >>> > 0. Initial state -> ROOT: F, META: A
> >> >>> >>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables,
> waited
> >> >>> for
> >> >>> >>> about
> >> >>> >>> >>> half
> >> >>> >>> >>> > an hour to get nothing BTW
> >> >>> >>> >>> > 2. Put F back online -> No effect
> >> >>> >>> >>> > 3. Create a table 'testtable' to see if we lose it
> >> >>> >>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
> >> >>> >>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it
> again
> >> >>> >>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We
> lost
> >> >>> >>> >>> 'testtable'
> >> >>> >>> >>> > but get our tables from before the simulation. It seemed
> like
> >> >>> >>> because A
> >> >>> >>> >>> had
> >> >>> >>> >>> > META before the simulation, the table definitions were
> >> revived.
> >> >>> >>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We
> lost
> >> 2
> >> >>> out
> >> >>> >>> of
> >> >>> >>> >>> our
> >> >>> >>> >>> > original 6 tables, 'testtable' revived. That small data
> seems
> >> >>> >>> corrupted
> >> >>> >>> >>> too
> >> >>> >>> >>> > as our Scans don't finish.
> >> >>> >>> >>> > 8. Run to mailing-list.
> >> >>> >>> >>> >
> >> >>> >>> >>> > First of all thanks for reading up to this point. From
> what
> >> we
> >> >>> are
> >> >>> >>> now,
> >> >>> >>> >>> we
> >> >>> >>> >>> > are not even sure if this is the expected behavior, like
> if
> >> ROOT
> >> >>> or
> >> >>> >>> META
> >> >>> >>> >>> > region dies we lose data and must do sth like hbck, or if
> we
> >> are
> >> >>> >>> missing
> >> >>> >>> >>> a
> >> >>> >>> >>> > configuration, or if this is a bug. No need to mention
> that
> >> we
> >> >>> are
> >> >>> >>> >>> > relatively new to HBase so the last possibility is that if
> we
> >> >>> didn't
> >> >>> >>> >>> > understand it at all.
> >> >>> >>> >>> >
> >> >>> >>> >>> > Thanks in advance for any ideas.
> >> >>> >>> >>> >
> >> >>> >>> >>> > --
> >> >>> >>> >>> > erdem agaoglu
> >> >>> >>> >>> >
> >> >>> >>> >>>
> >> >>> >>> >>
> >> >>> >>> >>
> >> >>> >>> >>
> >> >>> >>> >> --
> >> >>> >>> >> erdem agaoglu
> >> >>> >>> >>
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> > --
> >> >>> >>> > erdem agaoglu
> >> >>> >>> >
> >> >>> >>>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >> erdem agaoglu
> >> >>> >>
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > erdem agaoglu
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> erdem agaoglu
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > erdem agaoglu
> >> >
> >>
> >
> >
> >
> > --
> > erdem agaoglu
> >
>



-- 
erdem agaoglu

Re: Node failure causes weird META data?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I resolved the problem in trunk, a patch for 0.89 can also be found
attached in the jira. It should apply cleanly. Let me know if it works
better for you!

Thx,

J-D

On Fri, Nov 5, 2010 at 11:45 AM, Erdem Agaoglu <er...@gmail.com> wrote:
> Previous problems probably correlate. When log split and replay fails
> [because of the empty log], master thinks the cluster is in a clean state
> and assigns a new .META. on some regionserver. It does not know about the
> previous assignment of .META. thus does not split any logs for it. When
> regionserver opens new .META., it looks to hdfs and finds some persistent
> file for it, and since there is no recovered.edits it starts to serve.
> Depending on the state those files 2 things happen:
>
> 1. If the cluster is really new (like an hour new) all the .META. data will
> be on previous RS memstore. So there will be no accessible regions = lost
> tables
> 2. If the cluster is running for a while or gracefully restarted before, the
> .META. data from the previous run would be persisted. So the regions from
> this previous run will be accessible but not the newly generated or
> re-assigned regions = unfinishing scans or corrupt data
> *. No matter what, since master thinks the .META. is new, it will not know
> about and unassign original regions = double assignment
>
> So if we are right about those assumptions, it seems it's the same problem.
>
> We've learned a lot about HBase this week, and we owe it to your pointers,
> thanks.
> I am grateful have the chance to contribute to the community in any way,...
> at last :).
>
> On Fri, Nov 5, 2010 at 7:43 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Very interesting finding Erdem! I tried it on a local empty instance
>> and I set the roll period to 6 secs, here's what I saw:
>>
>> org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 6000ms
>> elapsed
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
>> syncFs -- HDFS-200
>> org.apache.hadoop.hbase.regionserver.wal.HLog: Roll
>> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177
>> %3A60020.1288977933829,
>> entries=1, filesize=295. New hlog
>> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177
>> %3A60020.1288977943913
>> org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs to remove
>>  out of total 1; oldest outstanding sequenceid is 270055 from region
>> -ROOT-,,0
>> org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file
>> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177
>> %3A60020.1288977933829
>> whose highest sequenceid is 270054 to
>> /hbase-89-su/.oldlogs/10.10.1.177%3A60020.1288977933829
>>
>> This looks like a very bad bug, I opened
>> https://issues.apache.org/jira/browse/HBASE-3198
>>
>> Can you correlate this with the previous tests you did? Hopefully it's
>> the same problem.
>>
>> Thanks a lot for your patience and investigative work!
>>
>> J-D
>>
>> On Fri, Nov 5, 2010 at 9:28 AM, Erdem Agaoglu <er...@gmail.com>
>> wrote:
>> > We have found the problem, again. But this time, we dug deeper and
>> extracted
>> > more info.
>> >
>> > First, the reason that we were unable to reproduce was trying to kill the
>> RS
>> > that holds -ROOT- too soon. That way,
>> > 1. log split goes fine,
>> > 2. -ROOT- gets assigned on somewhere else,
>> > 3. log replaying for that RS goes smoothly
>> > 4. .META. does not change.
>> > 5. everything is normal
>> >
>> > But if we try after a while of running the cluster,
>> > 1. log split fails with java.io.EOFException for the only hlog file,
>> > 2. -ROOT- gets assigned on somewhere else,
>> > 3. no log is replayed since none split
>> > 4. .META. is invalid, a new .META. is designated
>> > 5. data corrupted
>> >
>> > Before killing RS that holds -ROOT-, we checked the only hlog file of it,
>> > and see that it contains the updates for the place of the .META. in the
>> > first scenario. But in the second failing scenario, the file was empty.
>> We
>> > found that the log rolling process, which happens once an hour by
>> default,
>> > somehow thinks that the hlog is old and cleans it up.
>> >
>> > # LogRoller: Hlog roll period 3600000ms elapsed
>> > # SequenceFileLogWriter: Using syncFs -- HDFS-200
>> > # HLog: Roll /hbase/.logs/A/blahfile0316, entries=1, filesize=312. New
>> hlog
>> > /hbase/.logs/A/blahfile0476
>> > # HLog: Found 1 hlogs to remove  out of total 1; oldest outstanding
>> > sequenceid is 4 from region -ROOT-,,0
>> > # HLog: moving old hlog file /hbase/.logs/A/blahfile0316 whose highest
>> > sequenceid is 3 to /hbase/.oldlogs/blahfile0316
>> >
>> > In that instance, blahfile0316 contains some rows while blahfile0476 is
>> > empty. It very much seems like the problem is here, as it thinks the data
>> in
>> > memstore to be persisted while it is not. This seemed like a bug to us
>> but
>> > might as well be a misconfiguration. Any idea is greatly appreciated.
>> >
>> >
>> > On Wed, Nov 3, 2010 at 4:36 PM, Erdem Agaoglu <erdem.agaoglu@gmail.com
>> >wrote:
>> >
>> >> We suspect some misconfiguration too but are unsure what that might be.
>> >> Almost all of the configuration parameters are in default along with
>> >> dfs.support.append which is true in append branch. We checked RS logs
>> and
>> >> couldn't find any Exceptions, everything seems normal so it is unlikely
>> to
>> >> be a bug. The only WARN was about the last hlog file being empty but i
>> guess
>> >> this is normal.
>> >>
>> >> To confirm it is a append-configuration related problem, we killed the
>> RS
>> >> that holds -ROOT- right after running the cluster and checked the
>> >> /hbase/.oldlogs/. There was a file containing a line for -ROOT- so i
>> think
>> >> append works correctly. The file got removed after a while.
>> >>
>> >> BTW, we tried to reproduce it with smaller data (to control manually,
>> like
>> >> 2 rows in 1 table, newly put) but didn't have any problems. I am
>> starting to
>> >> think the problem just disappeared. I'll increase rows to hundreds and
>> check
>> >> again.
>> >>
>> >> Until i get the problem again, Thanks for your time.
>> >>
>> >> On Tue, Nov 2, 2010 at 7:28 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>> >>
>> >>> This confirms my suspicion that -ROOT-'s latest data was lost and that
>> >>> .META. got double assigned, which easily explains all the weirdness
>> >>> you're seeing after the log replay.
>> >>>
>> >>> The most probable reason would be misconfiguration, since the append
>> >>> feature is very well tested and a simple kill -9 wouldn't cause all
>> >>> sorts of issues. Else, it could be caused by an HBase bug, in which
>> >>> case exceptions would probably come out either during log splitting or
>> >>> during log replay (which happens when the region is reassigned to
>> >>> another region server). In your case you could take a look at the RS
>> >>> log where -ROOT- ends up and see if there's anything weird.
>> >>>
>> >>> To test if appends works correctly, in order to rule that out, just
>> >>> bring up your cluster and kill -9 right away the RS that's holding
>> >>> ROOT. Once the master did the log splitting, do a lookup directly in
>> >>> hadoop (bin/hadoop fs -cat) on that log file which should be under
>> >>> /hbase/.oldlogs/serverstring/somelog and look for that -ROOT- entry.
>> >>> If it's empty or missing, it's probably a misconfiguration.
>> >>>
>> >>> J-D
>> >>>
>> >>> On Tue, Nov 2, 2010 at 9:02 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
>> >
>> >>> wrote:
>> >>> > We reproduced the problem by kill -9ing the region server that hosts
>> >>> -ROOT-
>> >>> > again. Results are the same with what i have explained before. After
>> log
>> >>> > splitting metaScanner complained that .META. is not valid and
>> reassigned
>> >>> it
>> >>> > to somewhere else. As a result, the number of rows in .META. reduced
>> >>> from 24
>> >>> > to 16 meaning we lost access to some regions and some table
>> definitions.
>> >>> To
>> >>> > support what you have said about the data in the memstore, we seem to
>> >>> have
>> >>> > lost only recent regions.
>> >>> >
>> >>> > status 'detailed' showed no regionsInTransition, and showed a total
>> of
>> >>> 39
>> >>> > regions assigned to region servers, but as i've said before, we have
>> >>> access
>> >>> > to only 16 of them. No need to mention our queries fail.
>> >>> >
>> >>> > Any idea we could try is greatly appreciated. Thanks in advance.
>> >>> >
>> >>> > PS. Something probably irrelevant happened this time, interface 60010
>> >>> lists
>> >>> > all our tables while 'list' command in shell lists only a few. We get
>> >>> > TableNotFoundExceptions on tables shown in web interface but missing
>> in
>> >>> > 'list' command output. Seems like web interface is able to list
>> >>> inaccessible
>> >>> > tables.
>> >>> >
>> >>> > On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu <
>> erdem.agaoglu@gmail.com
>> >>> >wrote:
>> >>> >
>> >>> >> By "lost tables" i mean no user table definitions were listed in
>> >>> interface
>> >>> >> 60010 and my queries got me TableNotFoundExceptions. Routine
>> >>> BaseScanner
>> >>> >> logged like
>> >>> >>
>> >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region ...
>> >>> complete
>> >>> >>
>> >>> >> so i guess my .META. was empty. But unfortunately I don't know if
>> any
>> >>> >> regions were stuck in transition, i'll be sure to check this out
>> while
>> >>> i try
>> >>> >> to reproduce.
>> >>> >>
>> >>> >> I rechecked the logs, specifically after the splitting completed and
>> it
>> >>> >> contains lines like "Current assignment is not valid..." so i guess
>> >>> this is
>> >>> >> something unexpected. But in hopes of some configuration error you
>> may
>> >>> spot,
>> >>> >> whole log goes like that (stripped to be more readable):
>> >>> >>
>> >>> >> # hlog file splitting completed in 80372 millis
>> >>> >> # Log split complete, meta reassignment and scanning:
>> >>> >> # ProcessServerShutdown reassigning ROOT region
>> >>> >> # -ROOT- region unset (but not set to be reassigned)
>> >>> >> # ROOT inserted into regionsInTransition
>> >>> >> # Assigning for serverName=C...  total nregions to assign=1, regions
>> to
>> >>> >> give other servers than this=0, isMetaAssign=true
>> >>> >> # Assigning serverName=C... 1 regions
>> >>> >> # Assigning region -ROOT-... to C...
>> >>> >> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1
>> >>> >> # RegionManager.rootScanner scanning meta region {server: C ...
>> >>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
>> >>> >> startCode=blah unknown; checking once more!
>> >>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
>> >>> >> startCode=blah unknown.
>> >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server:
>> C
>> >>> ...
>> >>> >> complete
>> >>> >> # RegionManager.rootScanner scanning meta region {server: C ...
>> >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server:
>> C
>> >>> ...
>> >>> >> complete
>> >>> >> # Assigning for serverName=E...  total nregions to assign=-1,
>> regions
>> >>> to
>> >>> >> give other servers than this=2, isMetaAssign=true
>> >>> >> # Assigning serverName=E... -1 regions
>> >>> >> # Assigning region .META.... to E...
>> >>> >> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1
>> >>> >> # Processing todo: PendingOpenOperation from E...
>> >>> >> # .META.... open on E
>> >>> >> # Updated row .META.... in region -ROOT- ...
>> >>> >> # Adding to onlineMetaRegions: {server: E ...
>> >>> >> # RegionManager.metaScanner scanning meta region {server: E ...
>> >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region {server:
>> E
>> >>> ...
>> >>> >> # All 1 .META. region(s) scanned
>> >>> >> 10 secs later
>> >>> >> # Processing todo: ProcessServerShutdown of F
>> >>> >> # Process shutdown of server F...: logSplit: true, rootRescanned:
>> >>> false,
>> >>> >> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
>> >>> >> # Log split complete, meta reassignment and scanning
>> >>> >> # Process server shutdown scanning root region on C
>> >>> >> # Process server shutdown scanning root region on C finished master
>> >>> >> # process server shutdown scanning .META.,,1 on E
>> >>> >> # process server shutdown finished scanning .META.,,1 on E
>> >>> >> # Removed F... from deadservers Map
>> >>> >>
>> >>> >> Thanks again.
>> >>> >>
>> >>> >> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans <
>> >>> jdcryans@apache.org>wrote:
>> >>> >>
>> >>> >>> The fact that the tables are "revived" is a clue here IMO, but
>> let's
>> >>> >>> go back to more basic questions...
>> >>> >>>
>> >>> >>> So when you say that during step 1 you lost tables, what do you
>> mean
>> >>> >>> by "lost"? Were the rows of those tables still in .META.? Were the
>> >>> >>> regions stuck in transition (in the shell, do: status 'detailed')?
>> Or
>> >>> >>> when you tried to query them you just got a TableNotFoundException?
>> >>> >>>
>> >>> >>> Also, the fact that only -ROOT- and not .META. was on this region
>> >>> >>> server means that if there was any data lost, it would be .META.'s
>> >>> >>> location and it would have been assigned somewhere else (E), but
>> still
>> >>> >>> stayed assigned on A. Since the data is in the memstore, recent
>> data
>> >>> >>> couldn't be read by this second assignment of .META. but... it
>> could
>> >>> >>> also be reassigned for a "normal" reason like rebalancing. The best
>> >>> >>> way to confirm that is when the -ROOT- region gets reassigned at
>> the
>> >>> >>> end of step 1 (so this is after the message that goes like "...file
>> >>> >>> splitting completed in 80372..."), do so see something like this in
>> >>> >>> the master's log: "Current assignment of .META.,,some_timestamp  is
>> >>> >>> not valid; serverAddress=blah, startcode=blah unknown."? If so,
>> then
>> >>> >>> it seems that data was lost and this is really unexpected.
>> >>> >>>
>> >>> >>> J-D
>> >>> >>>
>> >>> >>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <
>> >>> erdem.agaoglu@gmail.com>
>> >>> >>> wrote:
>> >>> >>> > Hi again,
>> >>> >>> >
>> >>> >>> > I have re-checked our configuration to confirm that we have
>> >>> >>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in
>> >>> logs. I
>> >>> >>> > inspected logs around log splits to find something, but i'm not
>> sure
>> >>> >>> that's
>> >>> >>> > what we are looking for. In the first step of the scenario i
>> >>> mentioned
>> >>> >>> > before (where we kill -9ed everything on the node that hosts the
>> >>> ROOT
>> >>> >>> > region), HLog says (stripping hdfs:// prefixes and hostnames for
>> >>> >>> clarity)
>> >>> >>> >
>> >>> >>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
>> >>> >>> >
>> >>> >>> > Then it goes over every single one like
>> >>> >>> >
>> >>> >>> > # Splitting hlog 1 of 7
>> >>> >>> > # Splitting hlog 2 of 7
>> >>> >>> > # ...
>> >>> >>> > # Splitting hlog 7 of 7
>> >>> >>> >
>> >>> >>> > On the 7th hlog it WARNs with two lines
>> >>> >>> >
>> >>> >>> > # File .logs/F,60020,1287491528908/10.1.10.229
>> >>> %3A60020.1288021443546
>> >>> >>> might
>> >>> >>> > be still open, length is 0
>> >>> >>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229
>> >>> >>> %3A60020.1288021443546
>> >>> >>> > for reading. File is emptyjava.io.EOFException
>> >>> >>> >
>> >>> >>> > And completes with
>> >>> >>> >
>> >>> >>> > # log file splitting completed in 80372 millis for
>> >>> >>> > .logs/F,60020,1287491528908
>> >>> >>> >
>> >>> >>> > This might be it, but on the sixth step (where we kill -9ed the
>> >>> >>> RegionServer
>> >>> >>> > that hosts the only META region), it splits 2 hlogs without any
>> >>> empty
>> >>> >>> file
>> >>> >>> > problems nor any log above INFO, but as i told before, our
>> testtable
>> >>> >>> still
>> >>> >>> > got lost.
>> >>> >>> >
>> >>> >>> > I'll try to reproduce the problem in a cleaner way, but in the
>> >>> meantime,
>> >>> >>> any
>> >>> >>> > kind of pointers to any problems we might have is greatly
>> >>> appreciated.
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <
>> >>> erdem.agaoglu@gmail.com
>> >>> >>> >wrote:
>> >>> >>> >
>> >>> >>> >> Thanks for the answer.
>> >>> >>> >>
>> >>> >>> >> I am pretty sure we have dfs.support.append enabled. I remember
>> >>> both
>> >>> >>> the
>> >>> >>> >> conf file and the logs, and don't recall seeing any errors on
>> >>> 60010. I
>> >>> >>> >> crawled through logs all yesterday but don't remember anything
>> >>> >>> indicating a
>> >>> >>> >> specific error too. But i'm not sure about that. Let me check
>> that
>> >>> and
>> >>> >>> get
>> >>> >>> >> back here on monday.
>> >>> >>> >>
>> >>> >>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <
>> >>> >>> jdcryans@apache.org>wrote:
>> >>> >>> >>
>> >>> >>> >>> First thing I'd check is if your configuration has
>> >>> dfs.support.append,
>> >>> >>> >>> you can confirm this by looking at your region server logs.
>> When a
>> >>> RS
>> >>> >>> >>> starts, it creates an HLog and will print out: "Using syncFs --
>> >>> >>> >>> HDFS-200" if it's configured, else you'll see "syncFs --
>> HDFS-200
>> >>> --
>> >>> >>> >>> not available, dfs.support.append=false". Also the master web
>> ui
>> >>> (on
>> >>> >>> >>> port 60010) will print an error message regarding that.
>> >>> >>> >>>
>> >>> >>> >>> If it's all ok, then you should take a look at the master log
>> when
>> >>> it
>> >>> >>> >>> does the log splitting and see if it contains any obvious
>> errors.
>> >>> >>> >>>
>> >>> >>> >>> J-D
>> >>> >>> >>>
>> >>> >>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <
>> >>> >>> erdem.agaoglu@gmail.com>
>> >>> >>> >>> wrote:
>> >>> >>> >>> > Hi all,
>> >>> >>> >>> >
>> >>> >>> >>> > We have a testing cluster of 6 nodes which we try to run an
>> >>> >>> >>> HBase/MapReduce
>> >>> >>> >>> > application on. In order to simulate a power failure we kill
>> >>> -9ed
>> >>> >>> all
>> >>> >>> >>> things
>> >>> >>> >>> > hadoop related on one of the slave nodes (DataNode,
>> >>> RegionServer,
>> >>> >>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was
>> on
>> >>> >>> this
>> >>> >>> >>> node
>> >>> >>> >>> > too). We were expecting a smooth transition on all services
>> but
>> >>> were
>> >>> >>> >>> unable
>> >>> >>> >>> > to get on HBase end. While our regions seemed intact (not
>> >>> >>> confirmed), we
>> >>> >>> >>> > lost table definitions that pointed some kind of META region
>> >>> fail.
>> >>> >>> So
>> >>> >>> >>> our
>> >>> >>> >>> > application failed with several TableNotFoundExceptions.
>> >>> Simulation
>> >>> >>> was
>> >>> >>> >>> > conducted with no-load and extremely small data (like 10 rows
>> in
>> >>> 3
>> >>> >>> >>> tables).
>> >>> >>> >>> >
>> >>> >>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
>> >>> >>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the
>> >>> configuration
>> >>> >>> >>> > parameters are in default.
>> >>> >>> >>> >
>> >>> >>> >>> > If we did something wrong up to this point, please ignore the
>> >>> rest
>> >>> >>> of
>> >>> >>> >>> the
>> >>> >>> >>> > message as i'll try to explain what we did to reproduce it
>> and
>> >>> might
>> >>> >>> be
>> >>> >>> >>> > irrelevant.
>> >>> >>> >>> >
>> >>> >>> >>> > Say the machines are named A, B, C, D, E, F; where A is
>> >>> master-like
>> >>> >>> >>> node,
>> >>> >>> >>> > others are slaves and power fail is on F. Since we have
>> little
>> >>> data,
>> >>> >>> we
>> >>> >>> >>> have
>> >>> >>> >>> > one ROOT and only one META region. I'll try to sum up the
>> whole
>> >>> >>> >>> scenario.
>> >>> >>> >>> >
>> >>> >>> >>> > A: NN, DN, JT, TT, HM, RS
>> >>> >>> >>> > B: DN, TT, RS, ZK
>> >>> >>> >>> > C: DN, TT, RS, ZK
>> >>> >>> >>> > D: DN, TT, RS, ZK
>> >>> >>> >>> > E: DN, TT, RS, ZK
>> >>> >>> >>> > F: SNN, DN, TT, RS, ZK
>> >>> >>> >>> >
>> >>> >>> >>> > 0. Initial state -> ROOT: F, META: A
>> >>> >>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited
>> >>> for
>> >>> >>> about
>> >>> >>> >>> half
>> >>> >>> >>> > an hour to get nothing BTW
>> >>> >>> >>> > 2. Put F back online -> No effect
>> >>> >>> >>> > 3. Create a table 'testtable' to see if we lose it
>> >>> >>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
>> >>> >>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
>> >>> >>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
>> >>> >>> >>> 'testtable'
>> >>> >>> >>> > but get our tables from before the simulation. It seemed like
>> >>> >>> because A
>> >>> >>> >>> had
>> >>> >>> >>> > META before the simulation, the table definitions were
>> revived.
>> >>> >>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost
>> 2
>> >>> out
>> >>> >>> of
>> >>> >>> >>> our
>> >>> >>> >>> > original 6 tables, 'testtable' revived. That small data seems
>> >>> >>> corrupted
>> >>> >>> >>> too
>> >>> >>> >>> > as our Scans don't finish.
>> >>> >>> >>> > 8. Run to mailing-list.
>> >>> >>> >>> >
>> >>> >>> >>> > First of all thanks for reading up to this point. From what
>> we
>> >>> are
>> >>> >>> now,
>> >>> >>> >>> we
>> >>> >>> >>> > are not even sure if this is the expected behavior, like if
>> ROOT
>> >>> or
>> >>> >>> META
>> >>> >>> >>> > region dies we lose data and must do sth like hbck, or if we
>> are
>> >>> >>> missing
>> >>> >>> >>> a
>> >>> >>> >>> > configuration, or if this is a bug. No need to mention that
>> we
>> >>> are
>> >>> >>> >>> > relatively new to HBase so the last possibility is that if we
>> >>> didn't
>> >>> >>> >>> > understand it at all.
>> >>> >>> >>> >
>> >>> >>> >>> > Thanks in advance for any ideas.
>> >>> >>> >>> >
>> >>> >>> >>> > --
>> >>> >>> >>> > erdem agaoglu
>> >>> >>> >>> >
>> >>> >>> >>>
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> --
>> >>> >>> >> erdem agaoglu
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > --
>> >>> >>> > erdem agaoglu
>> >>> >>> >
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> erdem agaoglu
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > erdem agaoglu
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> erdem agaoglu
>> >>
>> >
>> >
>> >
>> > --
>> > erdem agaoglu
>> >
>>
>
>
>
> --
> erdem agaoglu
>

Re: Node failure causes weird META data?

Posted by Erdem Agaoglu <er...@gmail.com>.
Previous problems probably correlate. When log split and replay fails
[because of the empty log], master thinks the cluster is in a clean state
and assigns a new .META. on some regionserver. It does not know about the
previous assignment of .META. thus does not split any logs for it. When
regionserver opens new .META., it looks to hdfs and finds some persistent
file for it, and since there is no recovered.edits it starts to serve.
Depending on the state those files 2 things happen:

1. If the cluster is really new (like an hour new) all the .META. data will
be on previous RS memstore. So there will be no accessible regions = lost
tables
2. If the cluster is running for a while or gracefully restarted before, the
.META. data from the previous run would be persisted. So the regions from
this previous run will be accessible but not the newly generated or
re-assigned regions = unfinishing scans or corrupt data
*. No matter what, since master thinks the .META. is new, it will not know
about and unassign original regions = double assignment

So if we are right about those assumptions, it seems it's the same problem.

We've learned a lot about HBase this week, and we owe it to your pointers,
thanks.
I am grateful have the chance to contribute to the community in any way,...
at last :).

On Fri, Nov 5, 2010 at 7:43 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Very interesting finding Erdem! I tried it on a local empty instance
> and I set the roll period to 6 secs, here's what I saw:
>
> org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 6000ms
> elapsed
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
> syncFs -- HDFS-200
> org.apache.hadoop.hbase.regionserver.wal.HLog: Roll
> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177
> %3A60020.1288977933829,
> entries=1, filesize=295. New hlog
> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177
> %3A60020.1288977943913
> org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs to remove
>  out of total 1; oldest outstanding sequenceid is 270055 from region
> -ROOT-,,0
> org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file
> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177
> %3A60020.1288977933829
> whose highest sequenceid is 270054 to
> /hbase-89-su/.oldlogs/10.10.1.177%3A60020.1288977933829
>
> This looks like a very bad bug, I opened
> https://issues.apache.org/jira/browse/HBASE-3198
>
> Can you correlate this with the previous tests you did? Hopefully it's
> the same problem.
>
> Thanks a lot for your patience and investigative work!
>
> J-D
>
> On Fri, Nov 5, 2010 at 9:28 AM, Erdem Agaoglu <er...@gmail.com>
> wrote:
> > We have found the problem, again. But this time, we dug deeper and
> extracted
> > more info.
> >
> > First, the reason that we were unable to reproduce was trying to kill the
> RS
> > that holds -ROOT- too soon. That way,
> > 1. log split goes fine,
> > 2. -ROOT- gets assigned on somewhere else,
> > 3. log replaying for that RS goes smoothly
> > 4. .META. does not change.
> > 5. everything is normal
> >
> > But if we try after a while of running the cluster,
> > 1. log split fails with java.io.EOFException for the only hlog file,
> > 2. -ROOT- gets assigned on somewhere else,
> > 3. no log is replayed since none split
> > 4. .META. is invalid, a new .META. is designated
> > 5. data corrupted
> >
> > Before killing RS that holds -ROOT-, we checked the only hlog file of it,
> > and see that it contains the updates for the place of the .META. in the
> > first scenario. But in the second failing scenario, the file was empty.
> We
> > found that the log rolling process, which happens once an hour by
> default,
> > somehow thinks that the hlog is old and cleans it up.
> >
> > # LogRoller: Hlog roll period 3600000ms elapsed
> > # SequenceFileLogWriter: Using syncFs -- HDFS-200
> > # HLog: Roll /hbase/.logs/A/blahfile0316, entries=1, filesize=312. New
> hlog
> > /hbase/.logs/A/blahfile0476
> > # HLog: Found 1 hlogs to remove  out of total 1; oldest outstanding
> > sequenceid is 4 from region -ROOT-,,0
> > # HLog: moving old hlog file /hbase/.logs/A/blahfile0316 whose highest
> > sequenceid is 3 to /hbase/.oldlogs/blahfile0316
> >
> > In that instance, blahfile0316 contains some rows while blahfile0476 is
> > empty. It very much seems like the problem is here, as it thinks the data
> in
> > memstore to be persisted while it is not. This seemed like a bug to us
> but
> > might as well be a misconfiguration. Any idea is greatly appreciated.
> >
> >
> > On Wed, Nov 3, 2010 at 4:36 PM, Erdem Agaoglu <erdem.agaoglu@gmail.com
> >wrote:
> >
> >> We suspect some misconfiguration too but are unsure what that might be.
> >> Almost all of the configuration parameters are in default along with
> >> dfs.support.append which is true in append branch. We checked RS logs
> and
> >> couldn't find any Exceptions, everything seems normal so it is unlikely
> to
> >> be a bug. The only WARN was about the last hlog file being empty but i
> guess
> >> this is normal.
> >>
> >> To confirm it is a append-configuration related problem, we killed the
> RS
> >> that holds -ROOT- right after running the cluster and checked the
> >> /hbase/.oldlogs/. There was a file containing a line for -ROOT- so i
> think
> >> append works correctly. The file got removed after a while.
> >>
> >> BTW, we tried to reproduce it with smaller data (to control manually,
> like
> >> 2 rows in 1 table, newly put) but didn't have any problems. I am
> starting to
> >> think the problem just disappeared. I'll increase rows to hundreds and
> check
> >> again.
> >>
> >> Until i get the problem again, Thanks for your time.
> >>
> >> On Tue, Nov 2, 2010 at 7:28 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >>
> >>> This confirms my suspicion that -ROOT-'s latest data was lost and that
> >>> .META. got double assigned, which easily explains all the weirdness
> >>> you're seeing after the log replay.
> >>>
> >>> The most probable reason would be misconfiguration, since the append
> >>> feature is very well tested and a simple kill -9 wouldn't cause all
> >>> sorts of issues. Else, it could be caused by an HBase bug, in which
> >>> case exceptions would probably come out either during log splitting or
> >>> during log replay (which happens when the region is reassigned to
> >>> another region server). In your case you could take a look at the RS
> >>> log where -ROOT- ends up and see if there's anything weird.
> >>>
> >>> To test if appends works correctly, in order to rule that out, just
> >>> bring up your cluster and kill -9 right away the RS that's holding
> >>> ROOT. Once the master did the log splitting, do a lookup directly in
> >>> hadoop (bin/hadoop fs -cat) on that log file which should be under
> >>> /hbase/.oldlogs/serverstring/somelog and look for that -ROOT- entry.
> >>> If it's empty or missing, it's probably a misconfiguration.
> >>>
> >>> J-D
> >>>
> >>> On Tue, Nov 2, 2010 at 9:02 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
> >
> >>> wrote:
> >>> > We reproduced the problem by kill -9ing the region server that hosts
> >>> -ROOT-
> >>> > again. Results are the same with what i have explained before. After
> log
> >>> > splitting metaScanner complained that .META. is not valid and
> reassigned
> >>> it
> >>> > to somewhere else. As a result, the number of rows in .META. reduced
> >>> from 24
> >>> > to 16 meaning we lost access to some regions and some table
> definitions.
> >>> To
> >>> > support what you have said about the data in the memstore, we seem to
> >>> have
> >>> > lost only recent regions.
> >>> >
> >>> > status 'detailed' showed no regionsInTransition, and showed a total
> of
> >>> 39
> >>> > regions assigned to region servers, but as i've said before, we have
> >>> access
> >>> > to only 16 of them. No need to mention our queries fail.
> >>> >
> >>> > Any idea we could try is greatly appreciated. Thanks in advance.
> >>> >
> >>> > PS. Something probably irrelevant happened this time, interface 60010
> >>> lists
> >>> > all our tables while 'list' command in shell lists only a few. We get
> >>> > TableNotFoundExceptions on tables shown in web interface but missing
> in
> >>> > 'list' command output. Seems like web interface is able to list
> >>> inaccessible
> >>> > tables.
> >>> >
> >>> > On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu <
> erdem.agaoglu@gmail.com
> >>> >wrote:
> >>> >
> >>> >> By "lost tables" i mean no user table definitions were listed in
> >>> interface
> >>> >> 60010 and my queries got me TableNotFoundExceptions. Routine
> >>> BaseScanner
> >>> >> logged like
> >>> >>
> >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region ...
> >>> complete
> >>> >>
> >>> >> so i guess my .META. was empty. But unfortunately I don't know if
> any
> >>> >> regions were stuck in transition, i'll be sure to check this out
> while
> >>> i try
> >>> >> to reproduce.
> >>> >>
> >>> >> I rechecked the logs, specifically after the splitting completed and
> it
> >>> >> contains lines like "Current assignment is not valid..." so i guess
> >>> this is
> >>> >> something unexpected. But in hopes of some configuration error you
> may
> >>> spot,
> >>> >> whole log goes like that (stripped to be more readable):
> >>> >>
> >>> >> # hlog file splitting completed in 80372 millis
> >>> >> # Log split complete, meta reassignment and scanning:
> >>> >> # ProcessServerShutdown reassigning ROOT region
> >>> >> # -ROOT- region unset (but not set to be reassigned)
> >>> >> # ROOT inserted into regionsInTransition
> >>> >> # Assigning for serverName=C...  total nregions to assign=1, regions
> to
> >>> >> give other servers than this=0, isMetaAssign=true
> >>> >> # Assigning serverName=C... 1 regions
> >>> >> # Assigning region -ROOT-... to C...
> >>> >> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1
> >>> >> # RegionManager.rootScanner scanning meta region {server: C ...
> >>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
> >>> >> startCode=blah unknown; checking once more!
> >>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
> >>> >> startCode=blah unknown.
> >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server:
> C
> >>> ...
> >>> >> complete
> >>> >> # RegionManager.rootScanner scanning meta region {server: C ...
> >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server:
> C
> >>> ...
> >>> >> complete
> >>> >> # Assigning for serverName=E...  total nregions to assign=-1,
> regions
> >>> to
> >>> >> give other servers than this=2, isMetaAssign=true
> >>> >> # Assigning serverName=E... -1 regions
> >>> >> # Assigning region .META.... to E...
> >>> >> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1
> >>> >> # Processing todo: PendingOpenOperation from E...
> >>> >> # .META.... open on E
> >>> >> # Updated row .META.... in region -ROOT- ...
> >>> >> # Adding to onlineMetaRegions: {server: E ...
> >>> >> # RegionManager.metaScanner scanning meta region {server: E ...
> >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region {server:
> E
> >>> ...
> >>> >> # All 1 .META. region(s) scanned
> >>> >> 10 secs later
> >>> >> # Processing todo: ProcessServerShutdown of F
> >>> >> # Process shutdown of server F...: logSplit: true, rootRescanned:
> >>> false,
> >>> >> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> >>> >> # Log split complete, meta reassignment and scanning
> >>> >> # Process server shutdown scanning root region on C
> >>> >> # Process server shutdown scanning root region on C finished master
> >>> >> # process server shutdown scanning .META.,,1 on E
> >>> >> # process server shutdown finished scanning .META.,,1 on E
> >>> >> # Removed F... from deadservers Map
> >>> >>
> >>> >> Thanks again.
> >>> >>
> >>> >> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans <
> >>> jdcryans@apache.org>wrote:
> >>> >>
> >>> >>> The fact that the tables are "revived" is a clue here IMO, but
> let's
> >>> >>> go back to more basic questions...
> >>> >>>
> >>> >>> So when you say that during step 1 you lost tables, what do you
> mean
> >>> >>> by "lost"? Were the rows of those tables still in .META.? Were the
> >>> >>> regions stuck in transition (in the shell, do: status 'detailed')?
> Or
> >>> >>> when you tried to query them you just got a TableNotFoundException?
> >>> >>>
> >>> >>> Also, the fact that only -ROOT- and not .META. was on this region
> >>> >>> server means that if there was any data lost, it would be .META.'s
> >>> >>> location and it would have been assigned somewhere else (E), but
> still
> >>> >>> stayed assigned on A. Since the data is in the memstore, recent
> data
> >>> >>> couldn't be read by this second assignment of .META. but... it
> could
> >>> >>> also be reassigned for a "normal" reason like rebalancing. The best
> >>> >>> way to confirm that is when the -ROOT- region gets reassigned at
> the
> >>> >>> end of step 1 (so this is after the message that goes like "...file
> >>> >>> splitting completed in 80372..."), do so see something like this in
> >>> >>> the master's log: "Current assignment of .META.,,some_timestamp  is
> >>> >>> not valid; serverAddress=blah, startcode=blah unknown."? If so,
> then
> >>> >>> it seems that data was lost and this is really unexpected.
> >>> >>>
> >>> >>> J-D
> >>> >>>
> >>> >>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <
> >>> erdem.agaoglu@gmail.com>
> >>> >>> wrote:
> >>> >>> > Hi again,
> >>> >>> >
> >>> >>> > I have re-checked our configuration to confirm that we have
> >>> >>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in
> >>> logs. I
> >>> >>> > inspected logs around log splits to find something, but i'm not
> sure
> >>> >>> that's
> >>> >>> > what we are looking for. In the first step of the scenario i
> >>> mentioned
> >>> >>> > before (where we kill -9ed everything on the node that hosts the
> >>> ROOT
> >>> >>> > region), HLog says (stripping hdfs:// prefixes and hostnames for
> >>> >>> clarity)
> >>> >>> >
> >>> >>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
> >>> >>> >
> >>> >>> > Then it goes over every single one like
> >>> >>> >
> >>> >>> > # Splitting hlog 1 of 7
> >>> >>> > # Splitting hlog 2 of 7
> >>> >>> > # ...
> >>> >>> > # Splitting hlog 7 of 7
> >>> >>> >
> >>> >>> > On the 7th hlog it WARNs with two lines
> >>> >>> >
> >>> >>> > # File .logs/F,60020,1287491528908/10.1.10.229
> >>> %3A60020.1288021443546
> >>> >>> might
> >>> >>> > be still open, length is 0
> >>> >>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229
> >>> >>> %3A60020.1288021443546
> >>> >>> > for reading. File is emptyjava.io.EOFException
> >>> >>> >
> >>> >>> > And completes with
> >>> >>> >
> >>> >>> > # log file splitting completed in 80372 millis for
> >>> >>> > .logs/F,60020,1287491528908
> >>> >>> >
> >>> >>> > This might be it, but on the sixth step (where we kill -9ed the
> >>> >>> RegionServer
> >>> >>> > that hosts the only META region), it splits 2 hlogs without any
> >>> empty
> >>> >>> file
> >>> >>> > problems nor any log above INFO, but as i told before, our
> testtable
> >>> >>> still
> >>> >>> > got lost.
> >>> >>> >
> >>> >>> > I'll try to reproduce the problem in a cleaner way, but in the
> >>> meantime,
> >>> >>> any
> >>> >>> > kind of pointers to any problems we might have is greatly
> >>> appreciated.
> >>> >>> >
> >>> >>> >
> >>> >>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <
> >>> erdem.agaoglu@gmail.com
> >>> >>> >wrote:
> >>> >>> >
> >>> >>> >> Thanks for the answer.
> >>> >>> >>
> >>> >>> >> I am pretty sure we have dfs.support.append enabled. I remember
> >>> both
> >>> >>> the
> >>> >>> >> conf file and the logs, and don't recall seeing any errors on
> >>> 60010. I
> >>> >>> >> crawled through logs all yesterday but don't remember anything
> >>> >>> indicating a
> >>> >>> >> specific error too. But i'm not sure about that. Let me check
> that
> >>> and
> >>> >>> get
> >>> >>> >> back here on monday.
> >>> >>> >>
> >>> >>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <
> >>> >>> jdcryans@apache.org>wrote:
> >>> >>> >>
> >>> >>> >>> First thing I'd check is if your configuration has
> >>> dfs.support.append,
> >>> >>> >>> you can confirm this by looking at your region server logs.
> When a
> >>> RS
> >>> >>> >>> starts, it creates an HLog and will print out: "Using syncFs --
> >>> >>> >>> HDFS-200" if it's configured, else you'll see "syncFs --
> HDFS-200
> >>> --
> >>> >>> >>> not available, dfs.support.append=false". Also the master web
> ui
> >>> (on
> >>> >>> >>> port 60010) will print an error message regarding that.
> >>> >>> >>>
> >>> >>> >>> If it's all ok, then you should take a look at the master log
> when
> >>> it
> >>> >>> >>> does the log splitting and see if it contains any obvious
> errors.
> >>> >>> >>>
> >>> >>> >>> J-D
> >>> >>> >>>
> >>> >>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <
> >>> >>> erdem.agaoglu@gmail.com>
> >>> >>> >>> wrote:
> >>> >>> >>> > Hi all,
> >>> >>> >>> >
> >>> >>> >>> > We have a testing cluster of 6 nodes which we try to run an
> >>> >>> >>> HBase/MapReduce
> >>> >>> >>> > application on. In order to simulate a power failure we kill
> >>> -9ed
> >>> >>> all
> >>> >>> >>> things
> >>> >>> >>> > hadoop related on one of the slave nodes (DataNode,
> >>> RegionServer,
> >>> >>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was
> on
> >>> >>> this
> >>> >>> >>> node
> >>> >>> >>> > too). We were expecting a smooth transition on all services
> but
> >>> were
> >>> >>> >>> unable
> >>> >>> >>> > to get on HBase end. While our regions seemed intact (not
> >>> >>> confirmed), we
> >>> >>> >>> > lost table definitions that pointed some kind of META region
> >>> fail.
> >>> >>> So
> >>> >>> >>> our
> >>> >>> >>> > application failed with several TableNotFoundExceptions.
> >>> Simulation
> >>> >>> was
> >>> >>> >>> > conducted with no-load and extremely small data (like 10 rows
> in
> >>> 3
> >>> >>> >>> tables).
> >>> >>> >>> >
> >>> >>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
> >>> >>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the
> >>> configuration
> >>> >>> >>> > parameters are in default.
> >>> >>> >>> >
> >>> >>> >>> > If we did something wrong up to this point, please ignore the
> >>> rest
> >>> >>> of
> >>> >>> >>> the
> >>> >>> >>> > message as i'll try to explain what we did to reproduce it
> and
> >>> might
> >>> >>> be
> >>> >>> >>> > irrelevant.
> >>> >>> >>> >
> >>> >>> >>> > Say the machines are named A, B, C, D, E, F; where A is
> >>> master-like
> >>> >>> >>> node,
> >>> >>> >>> > others are slaves and power fail is on F. Since we have
> little
> >>> data,
> >>> >>> we
> >>> >>> >>> have
> >>> >>> >>> > one ROOT and only one META region. I'll try to sum up the
> whole
> >>> >>> >>> scenario.
> >>> >>> >>> >
> >>> >>> >>> > A: NN, DN, JT, TT, HM, RS
> >>> >>> >>> > B: DN, TT, RS, ZK
> >>> >>> >>> > C: DN, TT, RS, ZK
> >>> >>> >>> > D: DN, TT, RS, ZK
> >>> >>> >>> > E: DN, TT, RS, ZK
> >>> >>> >>> > F: SNN, DN, TT, RS, ZK
> >>> >>> >>> >
> >>> >>> >>> > 0. Initial state -> ROOT: F, META: A
> >>> >>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited
> >>> for
> >>> >>> about
> >>> >>> >>> half
> >>> >>> >>> > an hour to get nothing BTW
> >>> >>> >>> > 2. Put F back online -> No effect
> >>> >>> >>> > 3. Create a table 'testtable' to see if we lose it
> >>> >>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
> >>> >>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
> >>> >>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
> >>> >>> >>> 'testtable'
> >>> >>> >>> > but get our tables from before the simulation. It seemed like
> >>> >>> because A
> >>> >>> >>> had
> >>> >>> >>> > META before the simulation, the table definitions were
> revived.
> >>> >>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost
> 2
> >>> out
> >>> >>> of
> >>> >>> >>> our
> >>> >>> >>> > original 6 tables, 'testtable' revived. That small data seems
> >>> >>> corrupted
> >>> >>> >>> too
> >>> >>> >>> > as our Scans don't finish.
> >>> >>> >>> > 8. Run to mailing-list.
> >>> >>> >>> >
> >>> >>> >>> > First of all thanks for reading up to this point. From what
> we
> >>> are
> >>> >>> now,
> >>> >>> >>> we
> >>> >>> >>> > are not even sure if this is the expected behavior, like if
> ROOT
> >>> or
> >>> >>> META
> >>> >>> >>> > region dies we lose data and must do sth like hbck, or if we
> are
> >>> >>> missing
> >>> >>> >>> a
> >>> >>> >>> > configuration, or if this is a bug. No need to mention that
> we
> >>> are
> >>> >>> >>> > relatively new to HBase so the last possibility is that if we
> >>> didn't
> >>> >>> >>> > understand it at all.
> >>> >>> >>> >
> >>> >>> >>> > Thanks in advance for any ideas.
> >>> >>> >>> >
> >>> >>> >>> > --
> >>> >>> >>> > erdem agaoglu
> >>> >>> >>> >
> >>> >>> >>>
> >>> >>> >>
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> --
> >>> >>> >> erdem agaoglu
> >>> >>> >>
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > --
> >>> >>> > erdem agaoglu
> >>> >>> >
> >>> >>>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> erdem agaoglu
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > erdem agaoglu
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> erdem agaoglu
> >>
> >
> >
> >
> > --
> > erdem agaoglu
> >
>



-- 
erdem agaoglu

Re: Node failure causes weird META data?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Very interesting finding Erdem! I tried it on a local empty instance
and I set the roll period to 6 secs, here's what I saw:

org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 6000ms elapsed
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
syncFs -- HDFS-200
org.apache.hadoop.hbase.regionserver.wal.HLog: Roll
/hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177%3A60020.1288977933829,
entries=1, filesize=295. New hlog
/hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177%3A60020.1288977943913
org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs to remove
 out of total 1; oldest outstanding sequenceid is 270055 from region
-ROOT-,,0
org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file
/hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177%3A60020.1288977933829
whose highest sequenceid is 270054 to
/hbase-89-su/.oldlogs/10.10.1.177%3A60020.1288977933829

This looks like a very bad bug, I opened
https://issues.apache.org/jira/browse/HBASE-3198

Can you correlate this with the previous tests you did? Hopefully it's
the same problem.

Thanks a lot for your patience and investigative work!

J-D

On Fri, Nov 5, 2010 at 9:28 AM, Erdem Agaoglu <er...@gmail.com> wrote:
> We have found the problem, again. But this time, we dug deeper and extracted
> more info.
>
> First, the reason that we were unable to reproduce was trying to kill the RS
> that holds -ROOT- too soon. That way,
> 1. log split goes fine,
> 2. -ROOT- gets assigned on somewhere else,
> 3. log replaying for that RS goes smoothly
> 4. .META. does not change.
> 5. everything is normal
>
> But if we try after a while of running the cluster,
> 1. log split fails with java.io.EOFException for the only hlog file,
> 2. -ROOT- gets assigned on somewhere else,
> 3. no log is replayed since none split
> 4. .META. is invalid, a new .META. is designated
> 5. data corrupted
>
> Before killing RS that holds -ROOT-, we checked the only hlog file of it,
> and see that it contains the updates for the place of the .META. in the
> first scenario. But in the second failing scenario, the file was empty. We
> found that the log rolling process, which happens once an hour by default,
> somehow thinks that the hlog is old and cleans it up.
>
> # LogRoller: Hlog roll period 3600000ms elapsed
> # SequenceFileLogWriter: Using syncFs -- HDFS-200
> # HLog: Roll /hbase/.logs/A/blahfile0316, entries=1, filesize=312. New hlog
> /hbase/.logs/A/blahfile0476
> # HLog: Found 1 hlogs to remove  out of total 1; oldest outstanding
> sequenceid is 4 from region -ROOT-,,0
> # HLog: moving old hlog file /hbase/.logs/A/blahfile0316 whose highest
> sequenceid is 3 to /hbase/.oldlogs/blahfile0316
>
> In that instance, blahfile0316 contains some rows while blahfile0476 is
> empty. It very much seems like the problem is here, as it thinks the data in
> memstore to be persisted while it is not. This seemed like a bug to us but
> might as well be a misconfiguration. Any idea is greatly appreciated.
>
>
> On Wed, Nov 3, 2010 at 4:36 PM, Erdem Agaoglu <er...@gmail.com>wrote:
>
>> We suspect some misconfiguration too but are unsure what that might be.
>> Almost all of the configuration parameters are in default along with
>> dfs.support.append which is true in append branch. We checked RS logs and
>> couldn't find any Exceptions, everything seems normal so it is unlikely to
>> be a bug. The only WARN was about the last hlog file being empty but i guess
>> this is normal.
>>
>> To confirm it is a append-configuration related problem, we killed the RS
>> that holds -ROOT- right after running the cluster and checked the
>> /hbase/.oldlogs/. There was a file containing a line for -ROOT- so i think
>> append works correctly. The file got removed after a while.
>>
>> BTW, we tried to reproduce it with smaller data (to control manually, like
>> 2 rows in 1 table, newly put) but didn't have any problems. I am starting to
>> think the problem just disappeared. I'll increase rows to hundreds and check
>> again.
>>
>> Until i get the problem again, Thanks for your time.
>>
>> On Tue, Nov 2, 2010 at 7:28 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>>
>>> This confirms my suspicion that -ROOT-'s latest data was lost and that
>>> .META. got double assigned, which easily explains all the weirdness
>>> you're seeing after the log replay.
>>>
>>> The most probable reason would be misconfiguration, since the append
>>> feature is very well tested and a simple kill -9 wouldn't cause all
>>> sorts of issues. Else, it could be caused by an HBase bug, in which
>>> case exceptions would probably come out either during log splitting or
>>> during log replay (which happens when the region is reassigned to
>>> another region server). In your case you could take a look at the RS
>>> log where -ROOT- ends up and see if there's anything weird.
>>>
>>> To test if appends works correctly, in order to rule that out, just
>>> bring up your cluster and kill -9 right away the RS that's holding
>>> ROOT. Once the master did the log splitting, do a lookup directly in
>>> hadoop (bin/hadoop fs -cat) on that log file which should be under
>>> /hbase/.oldlogs/serverstring/somelog and look for that -ROOT- entry.
>>> If it's empty or missing, it's probably a misconfiguration.
>>>
>>> J-D
>>>
>>> On Tue, Nov 2, 2010 at 9:02 AM, Erdem Agaoglu <er...@gmail.com>
>>> wrote:
>>> > We reproduced the problem by kill -9ing the region server that hosts
>>> -ROOT-
>>> > again. Results are the same with what i have explained before. After log
>>> > splitting metaScanner complained that .META. is not valid and reassigned
>>> it
>>> > to somewhere else. As a result, the number of rows in .META. reduced
>>> from 24
>>> > to 16 meaning we lost access to some regions and some table definitions.
>>> To
>>> > support what you have said about the data in the memstore, we seem to
>>> have
>>> > lost only recent regions.
>>> >
>>> > status 'detailed' showed no regionsInTransition, and showed a total of
>>> 39
>>> > regions assigned to region servers, but as i've said before, we have
>>> access
>>> > to only 16 of them. No need to mention our queries fail.
>>> >
>>> > Any idea we could try is greatly appreciated. Thanks in advance.
>>> >
>>> > PS. Something probably irrelevant happened this time, interface 60010
>>> lists
>>> > all our tables while 'list' command in shell lists only a few. We get
>>> > TableNotFoundExceptions on tables shown in web interface but missing in
>>> > 'list' command output. Seems like web interface is able to list
>>> inaccessible
>>> > tables.
>>> >
>>> > On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
>>> >wrote:
>>> >
>>> >> By "lost tables" i mean no user table definitions were listed in
>>> interface
>>> >> 60010 and my queries got me TableNotFoundExceptions. Routine
>>> BaseScanner
>>> >> logged like
>>> >>
>>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region ...
>>> complete
>>> >>
>>> >> so i guess my .META. was empty. But unfortunately I don't know if any
>>> >> regions were stuck in transition, i'll be sure to check this out while
>>> i try
>>> >> to reproduce.
>>> >>
>>> >> I rechecked the logs, specifically after the splitting completed and it
>>> >> contains lines like "Current assignment is not valid..." so i guess
>>> this is
>>> >> something unexpected. But in hopes of some configuration error you may
>>> spot,
>>> >> whole log goes like that (stripped to be more readable):
>>> >>
>>> >> # hlog file splitting completed in 80372 millis
>>> >> # Log split complete, meta reassignment and scanning:
>>> >> # ProcessServerShutdown reassigning ROOT region
>>> >> # -ROOT- region unset (but not set to be reassigned)
>>> >> # ROOT inserted into regionsInTransition
>>> >> # Assigning for serverName=C...  total nregions to assign=1, regions to
>>> >> give other servers than this=0, isMetaAssign=true
>>> >> # Assigning serverName=C... 1 regions
>>> >> # Assigning region -ROOT-... to C...
>>> >> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1
>>> >> # RegionManager.rootScanner scanning meta region {server: C ...
>>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
>>> >> startCode=blah unknown; checking once more!
>>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
>>> >> startCode=blah unknown.
>>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C
>>> ...
>>> >> complete
>>> >> # RegionManager.rootScanner scanning meta region {server: C ...
>>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C
>>> ...
>>> >> complete
>>> >> # Assigning for serverName=E...  total nregions to assign=-1, regions
>>> to
>>> >> give other servers than this=2, isMetaAssign=true
>>> >> # Assigning serverName=E... -1 regions
>>> >> # Assigning region .META.... to E...
>>> >> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1
>>> >> # Processing todo: PendingOpenOperation from E...
>>> >> # .META.... open on E
>>> >> # Updated row .META.... in region -ROOT- ...
>>> >> # Adding to onlineMetaRegions: {server: E ...
>>> >> # RegionManager.metaScanner scanning meta region {server: E ...
>>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region {server: E
>>> ...
>>> >> # All 1 .META. region(s) scanned
>>> >> 10 secs later
>>> >> # Processing todo: ProcessServerShutdown of F
>>> >> # Process shutdown of server F...: logSplit: true, rootRescanned:
>>> false,
>>> >> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
>>> >> # Log split complete, meta reassignment and scanning
>>> >> # Process server shutdown scanning root region on C
>>> >> # Process server shutdown scanning root region on C finished master
>>> >> # process server shutdown scanning .META.,,1 on E
>>> >> # process server shutdown finished scanning .META.,,1 on E
>>> >> # Removed F... from deadservers Map
>>> >>
>>> >> Thanks again.
>>> >>
>>> >> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans <
>>> jdcryans@apache.org>wrote:
>>> >>
>>> >>> The fact that the tables are "revived" is a clue here IMO, but let's
>>> >>> go back to more basic questions...
>>> >>>
>>> >>> So when you say that during step 1 you lost tables, what do you mean
>>> >>> by "lost"? Were the rows of those tables still in .META.? Were the
>>> >>> regions stuck in transition (in the shell, do: status 'detailed')? Or
>>> >>> when you tried to query them you just got a TableNotFoundException?
>>> >>>
>>> >>> Also, the fact that only -ROOT- and not .META. was on this region
>>> >>> server means that if there was any data lost, it would be .META.'s
>>> >>> location and it would have been assigned somewhere else (E), but still
>>> >>> stayed assigned on A. Since the data is in the memstore, recent data
>>> >>> couldn't be read by this second assignment of .META. but... it could
>>> >>> also be reassigned for a "normal" reason like rebalancing. The best
>>> >>> way to confirm that is when the -ROOT- region gets reassigned at the
>>> >>> end of step 1 (so this is after the message that goes like "...file
>>> >>> splitting completed in 80372..."), do so see something like this in
>>> >>> the master's log: "Current assignment of .META.,,some_timestamp  is
>>> >>> not valid; serverAddress=blah, startcode=blah unknown."? If so, then
>>> >>> it seems that data was lost and this is really unexpected.
>>> >>>
>>> >>> J-D
>>> >>>
>>> >>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <
>>> erdem.agaoglu@gmail.com>
>>> >>> wrote:
>>> >>> > Hi again,
>>> >>> >
>>> >>> > I have re-checked our configuration to confirm that we have
>>> >>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in
>>> logs. I
>>> >>> > inspected logs around log splits to find something, but i'm not sure
>>> >>> that's
>>> >>> > what we are looking for. In the first step of the scenario i
>>> mentioned
>>> >>> > before (where we kill -9ed everything on the node that hosts the
>>> ROOT
>>> >>> > region), HLog says (stripping hdfs:// prefixes and hostnames for
>>> >>> clarity)
>>> >>> >
>>> >>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
>>> >>> >
>>> >>> > Then it goes over every single one like
>>> >>> >
>>> >>> > # Splitting hlog 1 of 7
>>> >>> > # Splitting hlog 2 of 7
>>> >>> > # ...
>>> >>> > # Splitting hlog 7 of 7
>>> >>> >
>>> >>> > On the 7th hlog it WARNs with two lines
>>> >>> >
>>> >>> > # File .logs/F,60020,1287491528908/10.1.10.229
>>> %3A60020.1288021443546
>>> >>> might
>>> >>> > be still open, length is 0
>>> >>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229
>>> >>> %3A60020.1288021443546
>>> >>> > for reading. File is emptyjava.io.EOFException
>>> >>> >
>>> >>> > And completes with
>>> >>> >
>>> >>> > # log file splitting completed in 80372 millis for
>>> >>> > .logs/F,60020,1287491528908
>>> >>> >
>>> >>> > This might be it, but on the sixth step (where we kill -9ed the
>>> >>> RegionServer
>>> >>> > that hosts the only META region), it splits 2 hlogs without any
>>> empty
>>> >>> file
>>> >>> > problems nor any log above INFO, but as i told before, our testtable
>>> >>> still
>>> >>> > got lost.
>>> >>> >
>>> >>> > I'll try to reproduce the problem in a cleaner way, but in the
>>> meantime,
>>> >>> any
>>> >>> > kind of pointers to any problems we might have is greatly
>>> appreciated.
>>> >>> >
>>> >>> >
>>> >>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <
>>> erdem.agaoglu@gmail.com
>>> >>> >wrote:
>>> >>> >
>>> >>> >> Thanks for the answer.
>>> >>> >>
>>> >>> >> I am pretty sure we have dfs.support.append enabled. I remember
>>> both
>>> >>> the
>>> >>> >> conf file and the logs, and don't recall seeing any errors on
>>> 60010. I
>>> >>> >> crawled through logs all yesterday but don't remember anything
>>> >>> indicating a
>>> >>> >> specific error too. But i'm not sure about that. Let me check that
>>> and
>>> >>> get
>>> >>> >> back here on monday.
>>> >>> >>
>>> >>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <
>>> >>> jdcryans@apache.org>wrote:
>>> >>> >>
>>> >>> >>> First thing I'd check is if your configuration has
>>> dfs.support.append,
>>> >>> >>> you can confirm this by looking at your region server logs. When a
>>> RS
>>> >>> >>> starts, it creates an HLog and will print out: "Using syncFs --
>>> >>> >>> HDFS-200" if it's configured, else you'll see "syncFs -- HDFS-200
>>> --
>>> >>> >>> not available, dfs.support.append=false". Also the master web ui
>>> (on
>>> >>> >>> port 60010) will print an error message regarding that.
>>> >>> >>>
>>> >>> >>> If it's all ok, then you should take a look at the master log when
>>> it
>>> >>> >>> does the log splitting and see if it contains any obvious errors.
>>> >>> >>>
>>> >>> >>> J-D
>>> >>> >>>
>>> >>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <
>>> >>> erdem.agaoglu@gmail.com>
>>> >>> >>> wrote:
>>> >>> >>> > Hi all,
>>> >>> >>> >
>>> >>> >>> > We have a testing cluster of 6 nodes which we try to run an
>>> >>> >>> HBase/MapReduce
>>> >>> >>> > application on. In order to simulate a power failure we kill
>>> -9ed
>>> >>> all
>>> >>> >>> things
>>> >>> >>> > hadoop related on one of the slave nodes (DataNode,
>>> RegionServer,
>>> >>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was on
>>> >>> this
>>> >>> >>> node
>>> >>> >>> > too). We were expecting a smooth transition on all services but
>>> were
>>> >>> >>> unable
>>> >>> >>> > to get on HBase end. While our regions seemed intact (not
>>> >>> confirmed), we
>>> >>> >>> > lost table definitions that pointed some kind of META region
>>> fail.
>>> >>> So
>>> >>> >>> our
>>> >>> >>> > application failed with several TableNotFoundExceptions.
>>> Simulation
>>> >>> was
>>> >>> >>> > conducted with no-load and extremely small data (like 10 rows in
>>> 3
>>> >>> >>> tables).
>>> >>> >>> >
>>> >>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
>>> >>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the
>>> configuration
>>> >>> >>> > parameters are in default.
>>> >>> >>> >
>>> >>> >>> > If we did something wrong up to this point, please ignore the
>>> rest
>>> >>> of
>>> >>> >>> the
>>> >>> >>> > message as i'll try to explain what we did to reproduce it and
>>> might
>>> >>> be
>>> >>> >>> > irrelevant.
>>> >>> >>> >
>>> >>> >>> > Say the machines are named A, B, C, D, E, F; where A is
>>> master-like
>>> >>> >>> node,
>>> >>> >>> > others are slaves and power fail is on F. Since we have little
>>> data,
>>> >>> we
>>> >>> >>> have
>>> >>> >>> > one ROOT and only one META region. I'll try to sum up the whole
>>> >>> >>> scenario.
>>> >>> >>> >
>>> >>> >>> > A: NN, DN, JT, TT, HM, RS
>>> >>> >>> > B: DN, TT, RS, ZK
>>> >>> >>> > C: DN, TT, RS, ZK
>>> >>> >>> > D: DN, TT, RS, ZK
>>> >>> >>> > E: DN, TT, RS, ZK
>>> >>> >>> > F: SNN, DN, TT, RS, ZK
>>> >>> >>> >
>>> >>> >>> > 0. Initial state -> ROOT: F, META: A
>>> >>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited
>>> for
>>> >>> about
>>> >>> >>> half
>>> >>> >>> > an hour to get nothing BTW
>>> >>> >>> > 2. Put F back online -> No effect
>>> >>> >>> > 3. Create a table 'testtable' to see if we lose it
>>> >>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
>>> >>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
>>> >>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
>>> >>> >>> 'testtable'
>>> >>> >>> > but get our tables from before the simulation. It seemed like
>>> >>> because A
>>> >>> >>> had
>>> >>> >>> > META before the simulation, the table definitions were revived.
>>> >>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost 2
>>> out
>>> >>> of
>>> >>> >>> our
>>> >>> >>> > original 6 tables, 'testtable' revived. That small data seems
>>> >>> corrupted
>>> >>> >>> too
>>> >>> >>> > as our Scans don't finish.
>>> >>> >>> > 8. Run to mailing-list.
>>> >>> >>> >
>>> >>> >>> > First of all thanks for reading up to this point. From what we
>>> are
>>> >>> now,
>>> >>> >>> we
>>> >>> >>> > are not even sure if this is the expected behavior, like if ROOT
>>> or
>>> >>> META
>>> >>> >>> > region dies we lose data and must do sth like hbck, or if we are
>>> >>> missing
>>> >>> >>> a
>>> >>> >>> > configuration, or if this is a bug. No need to mention that we
>>> are
>>> >>> >>> > relatively new to HBase so the last possibility is that if we
>>> didn't
>>> >>> >>> > understand it at all.
>>> >>> >>> >
>>> >>> >>> > Thanks in advance for any ideas.
>>> >>> >>> >
>>> >>> >>> > --
>>> >>> >>> > erdem agaoglu
>>> >>> >>> >
>>> >>> >>>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> erdem agaoglu
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > erdem agaoglu
>>> >>> >
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> erdem agaoglu
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > erdem agaoglu
>>> >
>>>
>>
>>
>>
>> --
>> erdem agaoglu
>>
>
>
>
> --
> erdem agaoglu
>

Re: Node failure causes weird META data?

Posted by Erdem Agaoglu <er...@gmail.com>.
We have found the problem, again. But this time, we dug deeper and extracted
more info.

First, the reason that we were unable to reproduce was trying to kill the RS
that holds -ROOT- too soon. That way,
1. log split goes fine,
2. -ROOT- gets assigned on somewhere else,
3. log replaying for that RS goes smoothly
4. .META. does not change.
5. everything is normal

But if we try after a while of running the cluster,
1. log split fails with java.io.EOFException for the only hlog file,
2. -ROOT- gets assigned on somewhere else,
3. no log is replayed since none split
4. .META. is invalid, a new .META. is designated
5. data corrupted

Before killing RS that holds -ROOT-, we checked the only hlog file of it,
and see that it contains the updates for the place of the .META. in the
first scenario. But in the second failing scenario, the file was empty. We
found that the log rolling process, which happens once an hour by default,
somehow thinks that the hlog is old and cleans it up.

# LogRoller: Hlog roll period 3600000ms elapsed
# SequenceFileLogWriter: Using syncFs -- HDFS-200
# HLog: Roll /hbase/.logs/A/blahfile0316, entries=1, filesize=312. New hlog
/hbase/.logs/A/blahfile0476
# HLog: Found 1 hlogs to remove  out of total 1; oldest outstanding
sequenceid is 4 from region -ROOT-,,0
# HLog: moving old hlog file /hbase/.logs/A/blahfile0316 whose highest
sequenceid is 3 to /hbase/.oldlogs/blahfile0316

In that instance, blahfile0316 contains some rows while blahfile0476 is
empty. It very much seems like the problem is here, as it thinks the data in
memstore to be persisted while it is not. This seemed like a bug to us but
might as well be a misconfiguration. Any idea is greatly appreciated.


On Wed, Nov 3, 2010 at 4:36 PM, Erdem Agaoglu <er...@gmail.com>wrote:

> We suspect some misconfiguration too but are unsure what that might be.
> Almost all of the configuration parameters are in default along with
> dfs.support.append which is true in append branch. We checked RS logs and
> couldn't find any Exceptions, everything seems normal so it is unlikely to
> be a bug. The only WARN was about the last hlog file being empty but i guess
> this is normal.
>
> To confirm it is a append-configuration related problem, we killed the RS
> that holds -ROOT- right after running the cluster and checked the
> /hbase/.oldlogs/. There was a file containing a line for -ROOT- so i think
> append works correctly. The file got removed after a while.
>
> BTW, we tried to reproduce it with smaller data (to control manually, like
> 2 rows in 1 table, newly put) but didn't have any problems. I am starting to
> think the problem just disappeared. I'll increase rows to hundreds and check
> again.
>
> Until i get the problem again, Thanks for your time.
>
> On Tue, Nov 2, 2010 at 7:28 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> This confirms my suspicion that -ROOT-'s latest data was lost and that
>> .META. got double assigned, which easily explains all the weirdness
>> you're seeing after the log replay.
>>
>> The most probable reason would be misconfiguration, since the append
>> feature is very well tested and a simple kill -9 wouldn't cause all
>> sorts of issues. Else, it could be caused by an HBase bug, in which
>> case exceptions would probably come out either during log splitting or
>> during log replay (which happens when the region is reassigned to
>> another region server). In your case you could take a look at the RS
>> log where -ROOT- ends up and see if there's anything weird.
>>
>> To test if appends works correctly, in order to rule that out, just
>> bring up your cluster and kill -9 right away the RS that's holding
>> ROOT. Once the master did the log splitting, do a lookup directly in
>> hadoop (bin/hadoop fs -cat) on that log file which should be under
>> /hbase/.oldlogs/serverstring/somelog and look for that -ROOT- entry.
>> If it's empty or missing, it's probably a misconfiguration.
>>
>> J-D
>>
>> On Tue, Nov 2, 2010 at 9:02 AM, Erdem Agaoglu <er...@gmail.com>
>> wrote:
>> > We reproduced the problem by kill -9ing the region server that hosts
>> -ROOT-
>> > again. Results are the same with what i have explained before. After log
>> > splitting metaScanner complained that .META. is not valid and reassigned
>> it
>> > to somewhere else. As a result, the number of rows in .META. reduced
>> from 24
>> > to 16 meaning we lost access to some regions and some table definitions.
>> To
>> > support what you have said about the data in the memstore, we seem to
>> have
>> > lost only recent regions.
>> >
>> > status 'detailed' showed no regionsInTransition, and showed a total of
>> 39
>> > regions assigned to region servers, but as i've said before, we have
>> access
>> > to only 16 of them. No need to mention our queries fail.
>> >
>> > Any idea we could try is greatly appreciated. Thanks in advance.
>> >
>> > PS. Something probably irrelevant happened this time, interface 60010
>> lists
>> > all our tables while 'list' command in shell lists only a few. We get
>> > TableNotFoundExceptions on tables shown in web interface but missing in
>> > 'list' command output. Seems like web interface is able to list
>> inaccessible
>> > tables.
>> >
>> > On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
>> >wrote:
>> >
>> >> By "lost tables" i mean no user table definitions were listed in
>> interface
>> >> 60010 and my queries got me TableNotFoundExceptions. Routine
>> BaseScanner
>> >> logged like
>> >>
>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region ...
>> complete
>> >>
>> >> so i guess my .META. was empty. But unfortunately I don't know if any
>> >> regions were stuck in transition, i'll be sure to check this out while
>> i try
>> >> to reproduce.
>> >>
>> >> I rechecked the logs, specifically after the splitting completed and it
>> >> contains lines like "Current assignment is not valid..." so i guess
>> this is
>> >> something unexpected. But in hopes of some configuration error you may
>> spot,
>> >> whole log goes like that (stripped to be more readable):
>> >>
>> >> # hlog file splitting completed in 80372 millis
>> >> # Log split complete, meta reassignment and scanning:
>> >> # ProcessServerShutdown reassigning ROOT region
>> >> # -ROOT- region unset (but not set to be reassigned)
>> >> # ROOT inserted into regionsInTransition
>> >> # Assigning for serverName=C...  total nregions to assign=1, regions to
>> >> give other servers than this=0, isMetaAssign=true
>> >> # Assigning serverName=C... 1 regions
>> >> # Assigning region -ROOT-... to C...
>> >> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1
>> >> # RegionManager.rootScanner scanning meta region {server: C ...
>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
>> >> startCode=blah unknown; checking once more!
>> >> # Current assignment of .META.... is not valid;  serverAddress=A,
>> >> startCode=blah unknown.
>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C
>> ...
>> >> complete
>> >> # RegionManager.rootScanner scanning meta region {server: C ...
>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C
>> ...
>> >> complete
>> >> # Assigning for serverName=E...  total nregions to assign=-1, regions
>> to
>> >> give other servers than this=2, isMetaAssign=true
>> >> # Assigning serverName=E... -1 regions
>> >> # Assigning region .META.... to E...
>> >> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1
>> >> # Processing todo: PendingOpenOperation from E...
>> >> # .META.... open on E
>> >> # Updated row .META.... in region -ROOT- ...
>> >> # Adding to onlineMetaRegions: {server: E ...
>> >> # RegionManager.metaScanner scanning meta region {server: E ...
>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region {server: E
>> ...
>> >> # All 1 .META. region(s) scanned
>> >> 10 secs later
>> >> # Processing todo: ProcessServerShutdown of F
>> >> # Process shutdown of server F...: logSplit: true, rootRescanned:
>> false,
>> >> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
>> >> # Log split complete, meta reassignment and scanning
>> >> # Process server shutdown scanning root region on C
>> >> # Process server shutdown scanning root region on C finished master
>> >> # process server shutdown scanning .META.,,1 on E
>> >> # process server shutdown finished scanning .META.,,1 on E
>> >> # Removed F... from deadservers Map
>> >>
>> >> Thanks again.
>> >>
>> >> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >>
>> >>> The fact that the tables are "revived" is a clue here IMO, but let's
>> >>> go back to more basic questions...
>> >>>
>> >>> So when you say that during step 1 you lost tables, what do you mean
>> >>> by "lost"? Were the rows of those tables still in .META.? Were the
>> >>> regions stuck in transition (in the shell, do: status 'detailed')? Or
>> >>> when you tried to query them you just got a TableNotFoundException?
>> >>>
>> >>> Also, the fact that only -ROOT- and not .META. was on this region
>> >>> server means that if there was any data lost, it would be .META.'s
>> >>> location and it would have been assigned somewhere else (E), but still
>> >>> stayed assigned on A. Since the data is in the memstore, recent data
>> >>> couldn't be read by this second assignment of .META. but... it could
>> >>> also be reassigned for a "normal" reason like rebalancing. The best
>> >>> way to confirm that is when the -ROOT- region gets reassigned at the
>> >>> end of step 1 (so this is after the message that goes like "...file
>> >>> splitting completed in 80372..."), do so see something like this in
>> >>> the master's log: "Current assignment of .META.,,some_timestamp  is
>> >>> not valid; serverAddress=blah, startcode=blah unknown."? If so, then
>> >>> it seems that data was lost and this is really unexpected.
>> >>>
>> >>> J-D
>> >>>
>> >>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <
>> erdem.agaoglu@gmail.com>
>> >>> wrote:
>> >>> > Hi again,
>> >>> >
>> >>> > I have re-checked our configuration to confirm that we have
>> >>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in
>> logs. I
>> >>> > inspected logs around log splits to find something, but i'm not sure
>> >>> that's
>> >>> > what we are looking for. In the first step of the scenario i
>> mentioned
>> >>> > before (where we kill -9ed everything on the node that hosts the
>> ROOT
>> >>> > region), HLog says (stripping hdfs:// prefixes and hostnames for
>> >>> clarity)
>> >>> >
>> >>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
>> >>> >
>> >>> > Then it goes over every single one like
>> >>> >
>> >>> > # Splitting hlog 1 of 7
>> >>> > # Splitting hlog 2 of 7
>> >>> > # ...
>> >>> > # Splitting hlog 7 of 7
>> >>> >
>> >>> > On the 7th hlog it WARNs with two lines
>> >>> >
>> >>> > # File .logs/F,60020,1287491528908/10.1.10.229
>> %3A60020.1288021443546
>> >>> might
>> >>> > be still open, length is 0
>> >>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229
>> >>> %3A60020.1288021443546
>> >>> > for reading. File is emptyjava.io.EOFException
>> >>> >
>> >>> > And completes with
>> >>> >
>> >>> > # log file splitting completed in 80372 millis for
>> >>> > .logs/F,60020,1287491528908
>> >>> >
>> >>> > This might be it, but on the sixth step (where we kill -9ed the
>> >>> RegionServer
>> >>> > that hosts the only META region), it splits 2 hlogs without any
>> empty
>> >>> file
>> >>> > problems nor any log above INFO, but as i told before, our testtable
>> >>> still
>> >>> > got lost.
>> >>> >
>> >>> > I'll try to reproduce the problem in a cleaner way, but in the
>> meantime,
>> >>> any
>> >>> > kind of pointers to any problems we might have is greatly
>> appreciated.
>> >>> >
>> >>> >
>> >>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <
>> erdem.agaoglu@gmail.com
>> >>> >wrote:
>> >>> >
>> >>> >> Thanks for the answer.
>> >>> >>
>> >>> >> I am pretty sure we have dfs.support.append enabled. I remember
>> both
>> >>> the
>> >>> >> conf file and the logs, and don't recall seeing any errors on
>> 60010. I
>> >>> >> crawled through logs all yesterday but don't remember anything
>> >>> indicating a
>> >>> >> specific error too. But i'm not sure about that. Let me check that
>> and
>> >>> get
>> >>> >> back here on monday.
>> >>> >>
>> >>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <
>> >>> jdcryans@apache.org>wrote:
>> >>> >>
>> >>> >>> First thing I'd check is if your configuration has
>> dfs.support.append,
>> >>> >>> you can confirm this by looking at your region server logs. When a
>> RS
>> >>> >>> starts, it creates an HLog and will print out: "Using syncFs --
>> >>> >>> HDFS-200" if it's configured, else you'll see "syncFs -- HDFS-200
>> --
>> >>> >>> not available, dfs.support.append=false". Also the master web ui
>> (on
>> >>> >>> port 60010) will print an error message regarding that.
>> >>> >>>
>> >>> >>> If it's all ok, then you should take a look at the master log when
>> it
>> >>> >>> does the log splitting and see if it contains any obvious errors.
>> >>> >>>
>> >>> >>> J-D
>> >>> >>>
>> >>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <
>> >>> erdem.agaoglu@gmail.com>
>> >>> >>> wrote:
>> >>> >>> > Hi all,
>> >>> >>> >
>> >>> >>> > We have a testing cluster of 6 nodes which we try to run an
>> >>> >>> HBase/MapReduce
>> >>> >>> > application on. In order to simulate a power failure we kill
>> -9ed
>> >>> all
>> >>> >>> things
>> >>> >>> > hadoop related on one of the slave nodes (DataNode,
>> RegionServer,
>> >>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was on
>> >>> this
>> >>> >>> node
>> >>> >>> > too). We were expecting a smooth transition on all services but
>> were
>> >>> >>> unable
>> >>> >>> > to get on HBase end. While our regions seemed intact (not
>> >>> confirmed), we
>> >>> >>> > lost table definitions that pointed some kind of META region
>> fail.
>> >>> So
>> >>> >>> our
>> >>> >>> > application failed with several TableNotFoundExceptions.
>> Simulation
>> >>> was
>> >>> >>> > conducted with no-load and extremely small data (like 10 rows in
>> 3
>> >>> >>> tables).
>> >>> >>> >
>> >>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
>> >>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the
>> configuration
>> >>> >>> > parameters are in default.
>> >>> >>> >
>> >>> >>> > If we did something wrong up to this point, please ignore the
>> rest
>> >>> of
>> >>> >>> the
>> >>> >>> > message as i'll try to explain what we did to reproduce it and
>> might
>> >>> be
>> >>> >>> > irrelevant.
>> >>> >>> >
>> >>> >>> > Say the machines are named A, B, C, D, E, F; where A is
>> master-like
>> >>> >>> node,
>> >>> >>> > others are slaves and power fail is on F. Since we have little
>> data,
>> >>> we
>> >>> >>> have
>> >>> >>> > one ROOT and only one META region. I'll try to sum up the whole
>> >>> >>> scenario.
>> >>> >>> >
>> >>> >>> > A: NN, DN, JT, TT, HM, RS
>> >>> >>> > B: DN, TT, RS, ZK
>> >>> >>> > C: DN, TT, RS, ZK
>> >>> >>> > D: DN, TT, RS, ZK
>> >>> >>> > E: DN, TT, RS, ZK
>> >>> >>> > F: SNN, DN, TT, RS, ZK
>> >>> >>> >
>> >>> >>> > 0. Initial state -> ROOT: F, META: A
>> >>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited
>> for
>> >>> about
>> >>> >>> half
>> >>> >>> > an hour to get nothing BTW
>> >>> >>> > 2. Put F back online -> No effect
>> >>> >>> > 3. Create a table 'testtable' to see if we lose it
>> >>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
>> >>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
>> >>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
>> >>> >>> 'testtable'
>> >>> >>> > but get our tables from before the simulation. It seemed like
>> >>> because A
>> >>> >>> had
>> >>> >>> > META before the simulation, the table definitions were revived.
>> >>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost 2
>> out
>> >>> of
>> >>> >>> our
>> >>> >>> > original 6 tables, 'testtable' revived. That small data seems
>> >>> corrupted
>> >>> >>> too
>> >>> >>> > as our Scans don't finish.
>> >>> >>> > 8. Run to mailing-list.
>> >>> >>> >
>> >>> >>> > First of all thanks for reading up to this point. From what we
>> are
>> >>> now,
>> >>> >>> we
>> >>> >>> > are not even sure if this is the expected behavior, like if ROOT
>> or
>> >>> META
>> >>> >>> > region dies we lose data and must do sth like hbck, or if we are
>> >>> missing
>> >>> >>> a
>> >>> >>> > configuration, or if this is a bug. No need to mention that we
>> are
>> >>> >>> > relatively new to HBase so the last possibility is that if we
>> didn't
>> >>> >>> > understand it at all.
>> >>> >>> >
>> >>> >>> > Thanks in advance for any ideas.
>> >>> >>> >
>> >>> >>> > --
>> >>> >>> > erdem agaoglu
>> >>> >>> >
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> erdem agaoglu
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > erdem agaoglu
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> erdem agaoglu
>> >>
>> >
>> >
>> >
>> > --
>> > erdem agaoglu
>> >
>>
>
>
>
> --
> erdem agaoglu
>



-- 
erdem agaoglu

Re: Node failure causes weird META data?

Posted by Erdem Agaoglu <er...@gmail.com>.
We suspect some misconfiguration too but are unsure what that might be.
Almost all of the configuration parameters are in default along with
dfs.support.append which is true in append branch. We checked RS logs and
couldn't find any Exceptions, everything seems normal so it is unlikely to
be a bug. The only WARN was about the last hlog file being empty but i guess
this is normal.

To confirm it is a append-configuration related problem, we killed the RS
that holds -ROOT- right after running the cluster and checked the
/hbase/.oldlogs/. There was a file containing a line for -ROOT- so i think
append works correctly. The file got removed after a while.

BTW, we tried to reproduce it with smaller data (to control manually, like 2
rows in 1 table, newly put) but didn't have any problems. I am starting to
think the problem just disappeared. I'll increase rows to hundreds and check
again.

Until i get the problem again, Thanks for your time.

On Tue, Nov 2, 2010 at 7:28 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> This confirms my suspicion that -ROOT-'s latest data was lost and that
> .META. got double assigned, which easily explains all the weirdness
> you're seeing after the log replay.
>
> The most probable reason would be misconfiguration, since the append
> feature is very well tested and a simple kill -9 wouldn't cause all
> sorts of issues. Else, it could be caused by an HBase bug, in which
> case exceptions would probably come out either during log splitting or
> during log replay (which happens when the region is reassigned to
> another region server). In your case you could take a look at the RS
> log where -ROOT- ends up and see if there's anything weird.
>
> To test if appends works correctly, in order to rule that out, just
> bring up your cluster and kill -9 right away the RS that's holding
> ROOT. Once the master did the log splitting, do a lookup directly in
> hadoop (bin/hadoop fs -cat) on that log file which should be under
> /hbase/.oldlogs/serverstring/somelog and look for that -ROOT- entry.
> If it's empty or missing, it's probably a misconfiguration.
>
> J-D
>
> On Tue, Nov 2, 2010 at 9:02 AM, Erdem Agaoglu <er...@gmail.com>
> wrote:
> > We reproduced the problem by kill -9ing the region server that hosts
> -ROOT-
> > again. Results are the same with what i have explained before. After log
> > splitting metaScanner complained that .META. is not valid and reassigned
> it
> > to somewhere else. As a result, the number of rows in .META. reduced from
> 24
> > to 16 meaning we lost access to some regions and some table definitions.
> To
> > support what you have said about the data in the memstore, we seem to
> have
> > lost only recent regions.
> >
> > status 'detailed' showed no regionsInTransition, and showed a total of 39
> > regions assigned to region servers, but as i've said before, we have
> access
> > to only 16 of them. No need to mention our queries fail.
> >
> > Any idea we could try is greatly appreciated. Thanks in advance.
> >
> > PS. Something probably irrelevant happened this time, interface 60010
> lists
> > all our tables while 'list' command in shell lists only a few. We get
> > TableNotFoundExceptions on tables shown in web interface but missing in
> > 'list' command output. Seems like web interface is able to list
> inaccessible
> > tables.
> >
> > On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
> >wrote:
> >
> >> By "lost tables" i mean no user table definitions were listed in
> interface
> >> 60010 and my queries got me TableNotFoundExceptions. Routine BaseScanner
> >> logged like
> >>
> >> # RegionManager.metaScanner scan of 0 row(s) of meta region ... complete
> >>
> >> so i guess my .META. was empty. But unfortunately I don't know if any
> >> regions were stuck in transition, i'll be sure to check this out while i
> try
> >> to reproduce.
> >>
> >> I rechecked the logs, specifically after the splitting completed and it
> >> contains lines like "Current assignment is not valid..." so i guess this
> is
> >> something unexpected. But in hopes of some configuration error you may
> spot,
> >> whole log goes like that (stripped to be more readable):
> >>
> >> # hlog file splitting completed in 80372 millis
> >> # Log split complete, meta reassignment and scanning:
> >> # ProcessServerShutdown reassigning ROOT region
> >> # -ROOT- region unset (but not set to be reassigned)
> >> # ROOT inserted into regionsInTransition
> >> # Assigning for serverName=C...  total nregions to assign=1, regions to
> >> give other servers than this=0, isMetaAssign=true
> >> # Assigning serverName=C... 1 regions
> >> # Assigning region -ROOT-... to C...
> >> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1
> >> # RegionManager.rootScanner scanning meta region {server: C ...
> >> # Current assignment of .META.... is not valid;  serverAddress=A,
> >> startCode=blah unknown; checking once more!
> >> # Current assignment of .META.... is not valid;  serverAddress=A,
> >> startCode=blah unknown.
> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C
> ...
> >> complete
> >> # RegionManager.rootScanner scanning meta region {server: C ...
> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C
> ...
> >> complete
> >> # Assigning for serverName=E...  total nregions to assign=-1, regions to
> >> give other servers than this=2, isMetaAssign=true
> >> # Assigning serverName=E... -1 regions
> >> # Assigning region .META.... to E...
> >> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1
> >> # Processing todo: PendingOpenOperation from E...
> >> # .META.... open on E
> >> # Updated row .META.... in region -ROOT- ...
> >> # Adding to onlineMetaRegions: {server: E ...
> >> # RegionManager.metaScanner scanning meta region {server: E ...
> >> # RegionManager.metaScanner scan of 0 row(s) of meta region {server: E
> ...
> >> # All 1 .META. region(s) scanned
> >> 10 secs later
> >> # Processing todo: ProcessServerShutdown of F
> >> # Process shutdown of server F...: logSplit: true, rootRescanned: false,
> >> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> >> # Log split complete, meta reassignment and scanning
> >> # Process server shutdown scanning root region on C
> >> # Process server shutdown scanning root region on C finished master
> >> # process server shutdown scanning .META.,,1 on E
> >> # process server shutdown finished scanning .META.,,1 on E
> >> # Removed F... from deadservers Map
> >>
> >> Thanks again.
> >>
> >> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >>
> >>> The fact that the tables are "revived" is a clue here IMO, but let's
> >>> go back to more basic questions...
> >>>
> >>> So when you say that during step 1 you lost tables, what do you mean
> >>> by "lost"? Were the rows of those tables still in .META.? Were the
> >>> regions stuck in transition (in the shell, do: status 'detailed')? Or
> >>> when you tried to query them you just got a TableNotFoundException?
> >>>
> >>> Also, the fact that only -ROOT- and not .META. was on this region
> >>> server means that if there was any data lost, it would be .META.'s
> >>> location and it would have been assigned somewhere else (E), but still
> >>> stayed assigned on A. Since the data is in the memstore, recent data
> >>> couldn't be read by this second assignment of .META. but... it could
> >>> also be reassigned for a "normal" reason like rebalancing. The best
> >>> way to confirm that is when the -ROOT- region gets reassigned at the
> >>> end of step 1 (so this is after the message that goes like "...file
> >>> splitting completed in 80372..."), do so see something like this in
> >>> the master's log: "Current assignment of .META.,,some_timestamp  is
> >>> not valid; serverAddress=blah, startcode=blah unknown."? If so, then
> >>> it seems that data was lost and this is really unexpected.
> >>>
> >>> J-D
> >>>
> >>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
> >
> >>> wrote:
> >>> > Hi again,
> >>> >
> >>> > I have re-checked our configuration to confirm that we have
> >>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in
> logs. I
> >>> > inspected logs around log splits to find something, but i'm not sure
> >>> that's
> >>> > what we are looking for. In the first step of the scenario i
> mentioned
> >>> > before (where we kill -9ed everything on the node that hosts the ROOT
> >>> > region), HLog says (stripping hdfs:// prefixes and hostnames for
> >>> clarity)
> >>> >
> >>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
> >>> >
> >>> > Then it goes over every single one like
> >>> >
> >>> > # Splitting hlog 1 of 7
> >>> > # Splitting hlog 2 of 7
> >>> > # ...
> >>> > # Splitting hlog 7 of 7
> >>> >
> >>> > On the 7th hlog it WARNs with two lines
> >>> >
> >>> > # File .logs/F,60020,1287491528908/10.1.10.229%3A60020.1288021443546
> >>> might
> >>> > be still open, length is 0
> >>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229
> >>> %3A60020.1288021443546
> >>> > for reading. File is emptyjava.io.EOFException
> >>> >
> >>> > And completes with
> >>> >
> >>> > # log file splitting completed in 80372 millis for
> >>> > .logs/F,60020,1287491528908
> >>> >
> >>> > This might be it, but on the sixth step (where we kill -9ed the
> >>> RegionServer
> >>> > that hosts the only META region), it splits 2 hlogs without any empty
> >>> file
> >>> > problems nor any log above INFO, but as i told before, our testtable
> >>> still
> >>> > got lost.
> >>> >
> >>> > I'll try to reproduce the problem in a cleaner way, but in the
> meantime,
> >>> any
> >>> > kind of pointers to any problems we might have is greatly
> appreciated.
> >>> >
> >>> >
> >>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <
> erdem.agaoglu@gmail.com
> >>> >wrote:
> >>> >
> >>> >> Thanks for the answer.
> >>> >>
> >>> >> I am pretty sure we have dfs.support.append enabled. I remember both
> >>> the
> >>> >> conf file and the logs, and don't recall seeing any errors on 60010.
> I
> >>> >> crawled through logs all yesterday but don't remember anything
> >>> indicating a
> >>> >> specific error too. But i'm not sure about that. Let me check that
> and
> >>> get
> >>> >> back here on monday.
> >>> >>
> >>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <
> >>> jdcryans@apache.org>wrote:
> >>> >>
> >>> >>> First thing I'd check is if your configuration has
> dfs.support.append,
> >>> >>> you can confirm this by looking at your region server logs. When a
> RS
> >>> >>> starts, it creates an HLog and will print out: "Using syncFs --
> >>> >>> HDFS-200" if it's configured, else you'll see "syncFs -- HDFS-200
> --
> >>> >>> not available, dfs.support.append=false". Also the master web ui
> (on
> >>> >>> port 60010) will print an error message regarding that.
> >>> >>>
> >>> >>> If it's all ok, then you should take a look at the master log when
> it
> >>> >>> does the log splitting and see if it contains any obvious errors.
> >>> >>>
> >>> >>> J-D
> >>> >>>
> >>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <
> >>> erdem.agaoglu@gmail.com>
> >>> >>> wrote:
> >>> >>> > Hi all,
> >>> >>> >
> >>> >>> > We have a testing cluster of 6 nodes which we try to run an
> >>> >>> HBase/MapReduce
> >>> >>> > application on. In order to simulate a power failure we kill -9ed
> >>> all
> >>> >>> things
> >>> >>> > hadoop related on one of the slave nodes (DataNode, RegionServer,
> >>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was on
> >>> this
> >>> >>> node
> >>> >>> > too). We were expecting a smooth transition on all services but
> were
> >>> >>> unable
> >>> >>> > to get on HBase end. While our regions seemed intact (not
> >>> confirmed), we
> >>> >>> > lost table definitions that pointed some kind of META region
> fail.
> >>> So
> >>> >>> our
> >>> >>> > application failed with several TableNotFoundExceptions.
> Simulation
> >>> was
> >>> >>> > conducted with no-load and extremely small data (like 10 rows in
> 3
> >>> >>> tables).
> >>> >>> >
> >>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
> >>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the
> configuration
> >>> >>> > parameters are in default.
> >>> >>> >
> >>> >>> > If we did something wrong up to this point, please ignore the
> rest
> >>> of
> >>> >>> the
> >>> >>> > message as i'll try to explain what we did to reproduce it and
> might
> >>> be
> >>> >>> > irrelevant.
> >>> >>> >
> >>> >>> > Say the machines are named A, B, C, D, E, F; where A is
> master-like
> >>> >>> node,
> >>> >>> > others are slaves and power fail is on F. Since we have little
> data,
> >>> we
> >>> >>> have
> >>> >>> > one ROOT and only one META region. I'll try to sum up the whole
> >>> >>> scenario.
> >>> >>> >
> >>> >>> > A: NN, DN, JT, TT, HM, RS
> >>> >>> > B: DN, TT, RS, ZK
> >>> >>> > C: DN, TT, RS, ZK
> >>> >>> > D: DN, TT, RS, ZK
> >>> >>> > E: DN, TT, RS, ZK
> >>> >>> > F: SNN, DN, TT, RS, ZK
> >>> >>> >
> >>> >>> > 0. Initial state -> ROOT: F, META: A
> >>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited for
> >>> about
> >>> >>> half
> >>> >>> > an hour to get nothing BTW
> >>> >>> > 2. Put F back online -> No effect
> >>> >>> > 3. Create a table 'testtable' to see if we lose it
> >>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
> >>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
> >>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
> >>> >>> 'testtable'
> >>> >>> > but get our tables from before the simulation. It seemed like
> >>> because A
> >>> >>> had
> >>> >>> > META before the simulation, the table definitions were revived.
> >>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost 2
> out
> >>> of
> >>> >>> our
> >>> >>> > original 6 tables, 'testtable' revived. That small data seems
> >>> corrupted
> >>> >>> too
> >>> >>> > as our Scans don't finish.
> >>> >>> > 8. Run to mailing-list.
> >>> >>> >
> >>> >>> > First of all thanks for reading up to this point. From what we
> are
> >>> now,
> >>> >>> we
> >>> >>> > are not even sure if this is the expected behavior, like if ROOT
> or
> >>> META
> >>> >>> > region dies we lose data and must do sth like hbck, or if we are
> >>> missing
> >>> >>> a
> >>> >>> > configuration, or if this is a bug. No need to mention that we
> are
> >>> >>> > relatively new to HBase so the last possibility is that if we
> didn't
> >>> >>> > understand it at all.
> >>> >>> >
> >>> >>> > Thanks in advance for any ideas.
> >>> >>> >
> >>> >>> > --
> >>> >>> > erdem agaoglu
> >>> >>> >
> >>> >>>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> erdem agaoglu
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > erdem agaoglu
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> erdem agaoglu
> >>
> >
> >
> >
> > --
> > erdem agaoglu
> >
>



-- 
erdem agaoglu

Re: Node failure causes weird META data?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
This confirms my suspicion that -ROOT-'s latest data was lost and that
.META. got double assigned, which easily explains all the weirdness
you're seeing after the log replay.

The most probable reason would be misconfiguration, since the append
feature is very well tested and a simple kill -9 wouldn't cause all
sorts of issues. Else, it could be caused by an HBase bug, in which
case exceptions would probably come out either during log splitting or
during log replay (which happens when the region is reassigned to
another region server). In your case you could take a look at the RS
log where -ROOT- ends up and see if there's anything weird.

To test if appends works correctly, in order to rule that out, just
bring up your cluster and kill -9 right away the RS that's holding
ROOT. Once the master did the log splitting, do a lookup directly in
hadoop (bin/hadoop fs -cat) on that log file which should be under
/hbase/.oldlogs/serverstring/somelog and look for that -ROOT- entry.
If it's empty or missing, it's probably a misconfiguration.

J-D

On Tue, Nov 2, 2010 at 9:02 AM, Erdem Agaoglu <er...@gmail.com> wrote:
> We reproduced the problem by kill -9ing the region server that hosts -ROOT-
> again. Results are the same with what i have explained before. After log
> splitting metaScanner complained that .META. is not valid and reassigned it
> to somewhere else. As a result, the number of rows in .META. reduced from 24
> to 16 meaning we lost access to some regions and some table definitions. To
> support what you have said about the data in the memstore, we seem to have
> lost only recent regions.
>
> status 'detailed' showed no regionsInTransition, and showed a total of 39
> regions assigned to region servers, but as i've said before, we have access
> to only 16 of them. No need to mention our queries fail.
>
> Any idea we could try is greatly appreciated. Thanks in advance.
>
> PS. Something probably irrelevant happened this time, interface 60010 lists
> all our tables while 'list' command in shell lists only a few. We get
> TableNotFoundExceptions on tables shown in web interface but missing in
> 'list' command output. Seems like web interface is able to list inaccessible
> tables.
>
> On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu <er...@gmail.com>wrote:
>
>> By "lost tables" i mean no user table definitions were listed in interface
>> 60010 and my queries got me TableNotFoundExceptions. Routine BaseScanner
>> logged like
>>
>> # RegionManager.metaScanner scan of 0 row(s) of meta region ... complete
>>
>> so i guess my .META. was empty. But unfortunately I don't know if any
>> regions were stuck in transition, i'll be sure to check this out while i try
>> to reproduce.
>>
>> I rechecked the logs, specifically after the splitting completed and it
>> contains lines like "Current assignment is not valid..." so i guess this is
>> something unexpected. But in hopes of some configuration error you may spot,
>> whole log goes like that (stripped to be more readable):
>>
>> # hlog file splitting completed in 80372 millis
>> # Log split complete, meta reassignment and scanning:
>> # ProcessServerShutdown reassigning ROOT region
>> # -ROOT- region unset (but not set to be reassigned)
>> # ROOT inserted into regionsInTransition
>> # Assigning for serverName=C...  total nregions to assign=1, regions to
>> give other servers than this=0, isMetaAssign=true
>> # Assigning serverName=C... 1 regions
>> # Assigning region -ROOT-... to C...
>> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1
>> # RegionManager.rootScanner scanning meta region {server: C ...
>> # Current assignment of .META.... is not valid;  serverAddress=A,
>> startCode=blah unknown; checking once more!
>> # Current assignment of .META.... is not valid;  serverAddress=A,
>> startCode=blah unknown.
>> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C ...
>> complete
>> # RegionManager.rootScanner scanning meta region {server: C ...
>> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C ...
>> complete
>> # Assigning for serverName=E...  total nregions to assign=-1, regions to
>> give other servers than this=2, isMetaAssign=true
>> # Assigning serverName=E... -1 regions
>> # Assigning region .META.... to E...
>> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1
>> # Processing todo: PendingOpenOperation from E...
>> # .META.... open on E
>> # Updated row .META.... in region -ROOT- ...
>> # Adding to onlineMetaRegions: {server: E ...
>> # RegionManager.metaScanner scanning meta region {server: E ...
>> # RegionManager.metaScanner scan of 0 row(s) of meta region {server: E ...
>> # All 1 .META. region(s) scanned
>> 10 secs later
>> # Processing todo: ProcessServerShutdown of F
>> # Process shutdown of server F...: logSplit: true, rootRescanned: false,
>> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
>> # Log split complete, meta reassignment and scanning
>> # Process server shutdown scanning root region on C
>> # Process server shutdown scanning root region on C finished master
>> # process server shutdown scanning .META.,,1 on E
>> # process server shutdown finished scanning .META.,,1 on E
>> # Removed F... from deadservers Map
>>
>> Thanks again.
>>
>> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>>
>>> The fact that the tables are "revived" is a clue here IMO, but let's
>>> go back to more basic questions...
>>>
>>> So when you say that during step 1 you lost tables, what do you mean
>>> by "lost"? Were the rows of those tables still in .META.? Were the
>>> regions stuck in transition (in the shell, do: status 'detailed')? Or
>>> when you tried to query them you just got a TableNotFoundException?
>>>
>>> Also, the fact that only -ROOT- and not .META. was on this region
>>> server means that if there was any data lost, it would be .META.'s
>>> location and it would have been assigned somewhere else (E), but still
>>> stayed assigned on A. Since the data is in the memstore, recent data
>>> couldn't be read by this second assignment of .META. but... it could
>>> also be reassigned for a "normal" reason like rebalancing. The best
>>> way to confirm that is when the -ROOT- region gets reassigned at the
>>> end of step 1 (so this is after the message that goes like "...file
>>> splitting completed in 80372..."), do so see something like this in
>>> the master's log: "Current assignment of .META.,,some_timestamp  is
>>> not valid; serverAddress=blah, startcode=blah unknown."? If so, then
>>> it seems that data was lost and this is really unexpected.
>>>
>>> J-D
>>>
>>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <er...@gmail.com>
>>> wrote:
>>> > Hi again,
>>> >
>>> > I have re-checked our configuration to confirm that we have
>>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in logs. I
>>> > inspected logs around log splits to find something, but i'm not sure
>>> that's
>>> > what we are looking for. In the first step of the scenario i mentioned
>>> > before (where we kill -9ed everything on the node that hosts the ROOT
>>> > region), HLog says (stripping hdfs:// prefixes and hostnames for
>>> clarity)
>>> >
>>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
>>> >
>>> > Then it goes over every single one like
>>> >
>>> > # Splitting hlog 1 of 7
>>> > # Splitting hlog 2 of 7
>>> > # ...
>>> > # Splitting hlog 7 of 7
>>> >
>>> > On the 7th hlog it WARNs with two lines
>>> >
>>> > # File .logs/F,60020,1287491528908/10.1.10.229%3A60020.1288021443546
>>> might
>>> > be still open, length is 0
>>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229
>>> %3A60020.1288021443546
>>> > for reading. File is emptyjava.io.EOFException
>>> >
>>> > And completes with
>>> >
>>> > # log file splitting completed in 80372 millis for
>>> > .logs/F,60020,1287491528908
>>> >
>>> > This might be it, but on the sixth step (where we kill -9ed the
>>> RegionServer
>>> > that hosts the only META region), it splits 2 hlogs without any empty
>>> file
>>> > problems nor any log above INFO, but as i told before, our testtable
>>> still
>>> > got lost.
>>> >
>>> > I'll try to reproduce the problem in a cleaner way, but in the meantime,
>>> any
>>> > kind of pointers to any problems we might have is greatly appreciated.
>>> >
>>> >
>>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
>>> >wrote:
>>> >
>>> >> Thanks for the answer.
>>> >>
>>> >> I am pretty sure we have dfs.support.append enabled. I remember both
>>> the
>>> >> conf file and the logs, and don't recall seeing any errors on 60010. I
>>> >> crawled through logs all yesterday but don't remember anything
>>> indicating a
>>> >> specific error too. But i'm not sure about that. Let me check that and
>>> get
>>> >> back here on monday.
>>> >>
>>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <
>>> jdcryans@apache.org>wrote:
>>> >>
>>> >>> First thing I'd check is if your configuration has dfs.support.append,
>>> >>> you can confirm this by looking at your region server logs. When a RS
>>> >>> starts, it creates an HLog and will print out: "Using syncFs --
>>> >>> HDFS-200" if it's configured, else you'll see "syncFs -- HDFS-200 --
>>> >>> not available, dfs.support.append=false". Also the master web ui (on
>>> >>> port 60010) will print an error message regarding that.
>>> >>>
>>> >>> If it's all ok, then you should take a look at the master log when it
>>> >>> does the log splitting and see if it contains any obvious errors.
>>> >>>
>>> >>> J-D
>>> >>>
>>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <
>>> erdem.agaoglu@gmail.com>
>>> >>> wrote:
>>> >>> > Hi all,
>>> >>> >
>>> >>> > We have a testing cluster of 6 nodes which we try to run an
>>> >>> HBase/MapReduce
>>> >>> > application on. In order to simulate a power failure we kill -9ed
>>> all
>>> >>> things
>>> >>> > hadoop related on one of the slave nodes (DataNode, RegionServer,
>>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was on
>>> this
>>> >>> node
>>> >>> > too). We were expecting a smooth transition on all services but were
>>> >>> unable
>>> >>> > to get on HBase end. While our regions seemed intact (not
>>> confirmed), we
>>> >>> > lost table definitions that pointed some kind of META region fail.
>>> So
>>> >>> our
>>> >>> > application failed with several TableNotFoundExceptions. Simulation
>>> was
>>> >>> > conducted with no-load and extremely small data (like 10 rows in 3
>>> >>> tables).
>>> >>> >
>>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
>>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the configuration
>>> >>> > parameters are in default.
>>> >>> >
>>> >>> > If we did something wrong up to this point, please ignore the rest
>>> of
>>> >>> the
>>> >>> > message as i'll try to explain what we did to reproduce it and might
>>> be
>>> >>> > irrelevant.
>>> >>> >
>>> >>> > Say the machines are named A, B, C, D, E, F; where A is master-like
>>> >>> node,
>>> >>> > others are slaves and power fail is on F. Since we have little data,
>>> we
>>> >>> have
>>> >>> > one ROOT and only one META region. I'll try to sum up the whole
>>> >>> scenario.
>>> >>> >
>>> >>> > A: NN, DN, JT, TT, HM, RS
>>> >>> > B: DN, TT, RS, ZK
>>> >>> > C: DN, TT, RS, ZK
>>> >>> > D: DN, TT, RS, ZK
>>> >>> > E: DN, TT, RS, ZK
>>> >>> > F: SNN, DN, TT, RS, ZK
>>> >>> >
>>> >>> > 0. Initial state -> ROOT: F, META: A
>>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited for
>>> about
>>> >>> half
>>> >>> > an hour to get nothing BTW
>>> >>> > 2. Put F back online -> No effect
>>> >>> > 3. Create a table 'testtable' to see if we lose it
>>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
>>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
>>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
>>> >>> 'testtable'
>>> >>> > but get our tables from before the simulation. It seemed like
>>> because A
>>> >>> had
>>> >>> > META before the simulation, the table definitions were revived.
>>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost 2 out
>>> of
>>> >>> our
>>> >>> > original 6 tables, 'testtable' revived. That small data seems
>>> corrupted
>>> >>> too
>>> >>> > as our Scans don't finish.
>>> >>> > 8. Run to mailing-list.
>>> >>> >
>>> >>> > First of all thanks for reading up to this point. From what we are
>>> now,
>>> >>> we
>>> >>> > are not even sure if this is the expected behavior, like if ROOT or
>>> META
>>> >>> > region dies we lose data and must do sth like hbck, or if we are
>>> missing
>>> >>> a
>>> >>> > configuration, or if this is a bug. No need to mention that we are
>>> >>> > relatively new to HBase so the last possibility is that if we didn't
>>> >>> > understand it at all.
>>> >>> >
>>> >>> > Thanks in advance for any ideas.
>>> >>> >
>>> >>> > --
>>> >>> > erdem agaoglu
>>> >>> >
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> erdem agaoglu
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > erdem agaoglu
>>> >
>>>
>>
>>
>>
>> --
>> erdem agaoglu
>>
>
>
>
> --
> erdem agaoglu
>

Re: Node failure causes weird META data?

Posted by Erdem Agaoglu <er...@gmail.com>.
We reproduced the problem by kill -9ing the region server that hosts -ROOT-
again. Results are the same with what i have explained before. After log
splitting metaScanner complained that .META. is not valid and reassigned it
to somewhere else. As a result, the number of rows in .META. reduced from 24
to 16 meaning we lost access to some regions and some table definitions. To
support what you have said about the data in the memstore, we seem to have
lost only recent regions.

status 'detailed' showed no regionsInTransition, and showed a total of 39
regions assigned to region servers, but as i've said before, we have access
to only 16 of them. No need to mention our queries fail.

Any idea we could try is greatly appreciated. Thanks in advance.

PS. Something probably irrelevant happened this time, interface 60010 lists
all our tables while 'list' command in shell lists only a few. We get
TableNotFoundExceptions on tables shown in web interface but missing in
'list' command output. Seems like web interface is able to list inaccessible
tables.

On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu <er...@gmail.com>wrote:

> By "lost tables" i mean no user table definitions were listed in interface
> 60010 and my queries got me TableNotFoundExceptions. Routine BaseScanner
> logged like
>
> # RegionManager.metaScanner scan of 0 row(s) of meta region ... complete
>
> so i guess my .META. was empty. But unfortunately I don't know if any
> regions were stuck in transition, i'll be sure to check this out while i try
> to reproduce.
>
> I rechecked the logs, specifically after the splitting completed and it
> contains lines like "Current assignment is not valid..." so i guess this is
> something unexpected. But in hopes of some configuration error you may spot,
> whole log goes like that (stripped to be more readable):
>
> # hlog file splitting completed in 80372 millis
> # Log split complete, meta reassignment and scanning:
> # ProcessServerShutdown reassigning ROOT region
> # -ROOT- region unset (but not set to be reassigned)
> # ROOT inserted into regionsInTransition
> # Assigning for serverName=C...  total nregions to assign=1, regions to
> give other servers than this=0, isMetaAssign=true
> # Assigning serverName=C... 1 regions
> # Assigning region -ROOT-... to C...
> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1
> # RegionManager.rootScanner scanning meta region {server: C ...
> # Current assignment of .META.... is not valid;  serverAddress=A,
> startCode=blah unknown; checking once more!
> # Current assignment of .META.... is not valid;  serverAddress=A,
> startCode=blah unknown.
> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C ...
> complete
> # RegionManager.rootScanner scanning meta region {server: C ...
> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C ...
> complete
> # Assigning for serverName=E...  total nregions to assign=-1, regions to
> give other servers than this=2, isMetaAssign=true
> # Assigning serverName=E... -1 regions
> # Assigning region .META.... to E...
> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1
> # Processing todo: PendingOpenOperation from E...
> # .META.... open on E
> # Updated row .META.... in region -ROOT- ...
> # Adding to onlineMetaRegions: {server: E ...
> # RegionManager.metaScanner scanning meta region {server: E ...
> # RegionManager.metaScanner scan of 0 row(s) of meta region {server: E ...
> # All 1 .META. region(s) scanned
> 10 secs later
> # Processing todo: ProcessServerShutdown of F
> # Process shutdown of server F...: logSplit: true, rootRescanned: false,
> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> # Log split complete, meta reassignment and scanning
> # Process server shutdown scanning root region on C
> # Process server shutdown scanning root region on C finished master
> # process server shutdown scanning .META.,,1 on E
> # process server shutdown finished scanning .META.,,1 on E
> # Removed F... from deadservers Map
>
> Thanks again.
>
> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> The fact that the tables are "revived" is a clue here IMO, but let's
>> go back to more basic questions...
>>
>> So when you say that during step 1 you lost tables, what do you mean
>> by "lost"? Were the rows of those tables still in .META.? Were the
>> regions stuck in transition (in the shell, do: status 'detailed')? Or
>> when you tried to query them you just got a TableNotFoundException?
>>
>> Also, the fact that only -ROOT- and not .META. was on this region
>> server means that if there was any data lost, it would be .META.'s
>> location and it would have been assigned somewhere else (E), but still
>> stayed assigned on A. Since the data is in the memstore, recent data
>> couldn't be read by this second assignment of .META. but... it could
>> also be reassigned for a "normal" reason like rebalancing. The best
>> way to confirm that is when the -ROOT- region gets reassigned at the
>> end of step 1 (so this is after the message that goes like "...file
>> splitting completed in 80372..."), do so see something like this in
>> the master's log: "Current assignment of .META.,,some_timestamp  is
>> not valid; serverAddress=blah, startcode=blah unknown."? If so, then
>> it seems that data was lost and this is really unexpected.
>>
>> J-D
>>
>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <er...@gmail.com>
>> wrote:
>> > Hi again,
>> >
>> > I have re-checked our configuration to confirm that we have
>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in logs. I
>> > inspected logs around log splits to find something, but i'm not sure
>> that's
>> > what we are looking for. In the first step of the scenario i mentioned
>> > before (where we kill -9ed everything on the node that hosts the ROOT
>> > region), HLog says (stripping hdfs:// prefixes and hostnames for
>> clarity)
>> >
>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
>> >
>> > Then it goes over every single one like
>> >
>> > # Splitting hlog 1 of 7
>> > # Splitting hlog 2 of 7
>> > # ...
>> > # Splitting hlog 7 of 7
>> >
>> > On the 7th hlog it WARNs with two lines
>> >
>> > # File .logs/F,60020,1287491528908/10.1.10.229%3A60020.1288021443546
>> might
>> > be still open, length is 0
>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229
>> %3A60020.1288021443546
>> > for reading. File is emptyjava.io.EOFException
>> >
>> > And completes with
>> >
>> > # log file splitting completed in 80372 millis for
>> > .logs/F,60020,1287491528908
>> >
>> > This might be it, but on the sixth step (where we kill -9ed the
>> RegionServer
>> > that hosts the only META region), it splits 2 hlogs without any empty
>> file
>> > problems nor any log above INFO, but as i told before, our testtable
>> still
>> > got lost.
>> >
>> > I'll try to reproduce the problem in a cleaner way, but in the meantime,
>> any
>> > kind of pointers to any problems we might have is greatly appreciated.
>> >
>> >
>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
>> >wrote:
>> >
>> >> Thanks for the answer.
>> >>
>> >> I am pretty sure we have dfs.support.append enabled. I remember both
>> the
>> >> conf file and the logs, and don't recall seeing any errors on 60010. I
>> >> crawled through logs all yesterday but don't remember anything
>> indicating a
>> >> specific error too. But i'm not sure about that. Let me check that and
>> get
>> >> back here on monday.
>> >>
>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >>
>> >>> First thing I'd check is if your configuration has dfs.support.append,
>> >>> you can confirm this by looking at your region server logs. When a RS
>> >>> starts, it creates an HLog and will print out: "Using syncFs --
>> >>> HDFS-200" if it's configured, else you'll see "syncFs -- HDFS-200 --
>> >>> not available, dfs.support.append=false". Also the master web ui (on
>> >>> port 60010) will print an error message regarding that.
>> >>>
>> >>> If it's all ok, then you should take a look at the master log when it
>> >>> does the log splitting and see if it contains any obvious errors.
>> >>>
>> >>> J-D
>> >>>
>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <
>> erdem.agaoglu@gmail.com>
>> >>> wrote:
>> >>> > Hi all,
>> >>> >
>> >>> > We have a testing cluster of 6 nodes which we try to run an
>> >>> HBase/MapReduce
>> >>> > application on. In order to simulate a power failure we kill -9ed
>> all
>> >>> things
>> >>> > hadoop related on one of the slave nodes (DataNode, RegionServer,
>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was on
>> this
>> >>> node
>> >>> > too). We were expecting a smooth transition on all services but were
>> >>> unable
>> >>> > to get on HBase end. While our regions seemed intact (not
>> confirmed), we
>> >>> > lost table definitions that pointed some kind of META region fail.
>> So
>> >>> our
>> >>> > application failed with several TableNotFoundExceptions. Simulation
>> was
>> >>> > conducted with no-load and extremely small data (like 10 rows in 3
>> >>> tables).
>> >>> >
>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the configuration
>> >>> > parameters are in default.
>> >>> >
>> >>> > If we did something wrong up to this point, please ignore the rest
>> of
>> >>> the
>> >>> > message as i'll try to explain what we did to reproduce it and might
>> be
>> >>> > irrelevant.
>> >>> >
>> >>> > Say the machines are named A, B, C, D, E, F; where A is master-like
>> >>> node,
>> >>> > others are slaves and power fail is on F. Since we have little data,
>> we
>> >>> have
>> >>> > one ROOT and only one META region. I'll try to sum up the whole
>> >>> scenario.
>> >>> >
>> >>> > A: NN, DN, JT, TT, HM, RS
>> >>> > B: DN, TT, RS, ZK
>> >>> > C: DN, TT, RS, ZK
>> >>> > D: DN, TT, RS, ZK
>> >>> > E: DN, TT, RS, ZK
>> >>> > F: SNN, DN, TT, RS, ZK
>> >>> >
>> >>> > 0. Initial state -> ROOT: F, META: A
>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited for
>> about
>> >>> half
>> >>> > an hour to get nothing BTW
>> >>> > 2. Put F back online -> No effect
>> >>> > 3. Create a table 'testtable' to see if we lose it
>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
>> >>> 'testtable'
>> >>> > but get our tables from before the simulation. It seemed like
>> because A
>> >>> had
>> >>> > META before the simulation, the table definitions were revived.
>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost 2 out
>> of
>> >>> our
>> >>> > original 6 tables, 'testtable' revived. That small data seems
>> corrupted
>> >>> too
>> >>> > as our Scans don't finish.
>> >>> > 8. Run to mailing-list.
>> >>> >
>> >>> > First of all thanks for reading up to this point. From what we are
>> now,
>> >>> we
>> >>> > are not even sure if this is the expected behavior, like if ROOT or
>> META
>> >>> > region dies we lose data and must do sth like hbck, or if we are
>> missing
>> >>> a
>> >>> > configuration, or if this is a bug. No need to mention that we are
>> >>> > relatively new to HBase so the last possibility is that if we didn't
>> >>> > understand it at all.
>> >>> >
>> >>> > Thanks in advance for any ideas.
>> >>> >
>> >>> > --
>> >>> > erdem agaoglu
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> erdem agaoglu
>> >>
>> >
>> >
>> >
>> > --
>> > erdem agaoglu
>> >
>>
>
>
>
> --
> erdem agaoglu
>



-- 
erdem agaoglu

Re: Node failure causes weird META data?

Posted by Erdem Agaoglu <er...@gmail.com>.
By "lost tables" i mean no user table definitions were listed in interface
60010 and my queries got me TableNotFoundExceptions. Routine BaseScanner
logged like

# RegionManager.metaScanner scan of 0 row(s) of meta region ... complete

so i guess my .META. was empty. But unfortunately I don't know if any
regions were stuck in transition, i'll be sure to check this out while i try
to reproduce.

I rechecked the logs, specifically after the splitting completed and it
contains lines like "Current assignment is not valid..." so i guess this is
something unexpected. But in hopes of some configuration error you may spot,
whole log goes like that (stripped to be more readable):

# hlog file splitting completed in 80372 millis
# Log split complete, meta reassignment and scanning:
# ProcessServerShutdown reassigning ROOT region
# -ROOT- region unset (but not set to be reassigned)
# ROOT inserted into regionsInTransition
# Assigning for serverName=C...  total nregions to assign=1, regions to give
other servers than this=0, isMetaAssign=true
# Assigning serverName=C... 1 regions
# Assigning region -ROOT-... to C...
# Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1
# RegionManager.rootScanner scanning meta region {server: C ...
# Current assignment of .META.... is not valid;  serverAddress=A,
startCode=blah unknown; checking once more!
# Current assignment of .META.... is not valid;  serverAddress=A,
startCode=blah unknown.
# RegionManager.rootScanner scan of 1 row(s) of meta region {server: C ...
complete
# RegionManager.rootScanner scanning meta region {server: C ...
# RegionManager.rootScanner scan of 1 row(s) of meta region {server: C ...
complete
# Assigning for serverName=E...  total nregions to assign=-1, regions to
give other servers than this=2, isMetaAssign=true
# Assigning serverName=E... -1 regions
# Assigning region .META.... to E...
# Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1
# Processing todo: PendingOpenOperation from E...
# .META.... open on E
# Updated row .META.... in region -ROOT- ...
# Adding to onlineMetaRegions: {server: E ...
# RegionManager.metaScanner scanning meta region {server: E ...
# RegionManager.metaScanner scan of 0 row(s) of meta region {server: E ...
# All 1 .META. region(s) scanned
10 secs later
# Processing todo: ProcessServerShutdown of F
# Process shutdown of server F...: logSplit: true, rootRescanned: false,
numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
# Log split complete, meta reassignment and scanning
# Process server shutdown scanning root region on C
# Process server shutdown scanning root region on C finished master
# process server shutdown scanning .META.,,1 on E
# process server shutdown finished scanning .META.,,1 on E
# Removed F... from deadservers Map

Thanks again.

On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> The fact that the tables are "revived" is a clue here IMO, but let's
> go back to more basic questions...
>
> So when you say that during step 1 you lost tables, what do you mean
> by "lost"? Were the rows of those tables still in .META.? Were the
> regions stuck in transition (in the shell, do: status 'detailed')? Or
> when you tried to query them you just got a TableNotFoundException?
>
> Also, the fact that only -ROOT- and not .META. was on this region
> server means that if there was any data lost, it would be .META.'s
> location and it would have been assigned somewhere else (E), but still
> stayed assigned on A. Since the data is in the memstore, recent data
> couldn't be read by this second assignment of .META. but... it could
> also be reassigned for a "normal" reason like rebalancing. The best
> way to confirm that is when the -ROOT- region gets reassigned at the
> end of step 1 (so this is after the message that goes like "...file
> splitting completed in 80372..."), do so see something like this in
> the master's log: "Current assignment of .META.,,some_timestamp  is
> not valid; serverAddress=blah, startcode=blah unknown."? If so, then
> it seems that data was lost and this is really unexpected.
>
> J-D
>
> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <er...@gmail.com>
> wrote:
> > Hi again,
> >
> > I have re-checked our configuration to confirm that we have
> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in logs. I
> > inspected logs around log splits to find something, but i'm not sure
> that's
> > what we are looking for. In the first step of the scenario i mentioned
> > before (where we kill -9ed everything on the node that hosts the ROOT
> > region), HLog says (stripping hdfs:// prefixes and hostnames for clarity)
> >
> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
> >
> > Then it goes over every single one like
> >
> > # Splitting hlog 1 of 7
> > # Splitting hlog 2 of 7
> > # ...
> > # Splitting hlog 7 of 7
> >
> > On the 7th hlog it WARNs with two lines
> >
> > # File .logs/F,60020,1287491528908/10.1.10.229%3A60020.1288021443546
> might
> > be still open, length is 0
> > # Could not open .logs/F,60020,1287491528908/10.1.10.229
> %3A60020.1288021443546
> > for reading. File is emptyjava.io.EOFException
> >
> > And completes with
> >
> > # log file splitting completed in 80372 millis for
> > .logs/F,60020,1287491528908
> >
> > This might be it, but on the sixth step (where we kill -9ed the
> RegionServer
> > that hosts the only META region), it splits 2 hlogs without any empty
> file
> > problems nor any log above INFO, but as i told before, our testtable
> still
> > got lost.
> >
> > I'll try to reproduce the problem in a cleaner way, but in the meantime,
> any
> > kind of pointers to any problems we might have is greatly appreciated.
> >
> >
> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
> >wrote:
> >
> >> Thanks for the answer.
> >>
> >> I am pretty sure we have dfs.support.append enabled. I remember both the
> >> conf file and the logs, and don't recall seeing any errors on 60010. I
> >> crawled through logs all yesterday but don't remember anything
> indicating a
> >> specific error too. But i'm not sure about that. Let me check that and
> get
> >> back here on monday.
> >>
> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >>
> >>> First thing I'd check is if your configuration has dfs.support.append,
> >>> you can confirm this by looking at your region server logs. When a RS
> >>> starts, it creates an HLog and will print out: "Using syncFs --
> >>> HDFS-200" if it's configured, else you'll see "syncFs -- HDFS-200 --
> >>> not available, dfs.support.append=false". Also the master web ui (on
> >>> port 60010) will print an error message regarding that.
> >>>
> >>> If it's all ok, then you should take a look at the master log when it
> >>> does the log splitting and see if it contains any obvious errors.
> >>>
> >>> J-D
> >>>
> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <
> erdem.agaoglu@gmail.com>
> >>> wrote:
> >>> > Hi all,
> >>> >
> >>> > We have a testing cluster of 6 nodes which we try to run an
> >>> HBase/MapReduce
> >>> > application on. In order to simulate a power failure we kill -9ed all
> >>> things
> >>> > hadoop related on one of the slave nodes (DataNode, RegionServer,
> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was on this
> >>> node
> >>> > too). We were expecting a smooth transition on all services but were
> >>> unable
> >>> > to get on HBase end. While our regions seemed intact (not confirmed),
> we
> >>> > lost table definitions that pointed some kind of META region fail. So
> >>> our
> >>> > application failed with several TableNotFoundExceptions. Simulation
> was
> >>> > conducted with no-load and extremely small data (like 10 rows in 3
> >>> tables).
> >>> >
> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the configuration
> >>> > parameters are in default.
> >>> >
> >>> > If we did something wrong up to this point, please ignore the rest of
> >>> the
> >>> > message as i'll try to explain what we did to reproduce it and might
> be
> >>> > irrelevant.
> >>> >
> >>> > Say the machines are named A, B, C, D, E, F; where A is master-like
> >>> node,
> >>> > others are slaves and power fail is on F. Since we have little data,
> we
> >>> have
> >>> > one ROOT and only one META region. I'll try to sum up the whole
> >>> scenario.
> >>> >
> >>> > A: NN, DN, JT, TT, HM, RS
> >>> > B: DN, TT, RS, ZK
> >>> > C: DN, TT, RS, ZK
> >>> > D: DN, TT, RS, ZK
> >>> > E: DN, TT, RS, ZK
> >>> > F: SNN, DN, TT, RS, ZK
> >>> >
> >>> > 0. Initial state -> ROOT: F, META: A
> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited for
> about
> >>> half
> >>> > an hour to get nothing BTW
> >>> > 2. Put F back online -> No effect
> >>> > 3. Create a table 'testtable' to see if we lose it
> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
> >>> 'testtable'
> >>> > but get our tables from before the simulation. It seemed like because
> A
> >>> had
> >>> > META before the simulation, the table definitions were revived.
> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost 2 out
> of
> >>> our
> >>> > original 6 tables, 'testtable' revived. That small data seems
> corrupted
> >>> too
> >>> > as our Scans don't finish.
> >>> > 8. Run to mailing-list.
> >>> >
> >>> > First of all thanks for reading up to this point. From what we are
> now,
> >>> we
> >>> > are not even sure if this is the expected behavior, like if ROOT or
> META
> >>> > region dies we lose data and must do sth like hbck, or if we are
> missing
> >>> a
> >>> > configuration, or if this is a bug. No need to mention that we are
> >>> > relatively new to HBase so the last possibility is that if we didn't
> >>> > understand it at all.
> >>> >
> >>> > Thanks in advance for any ideas.
> >>> >
> >>> > --
> >>> > erdem agaoglu
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> erdem agaoglu
> >>
> >
> >
> >
> > --
> > erdem agaoglu
> >
>



-- 
erdem agaoglu

Re: Node failure causes weird META data?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
The fact that the tables are "revived" is a clue here IMO, but let's
go back to more basic questions...

So when you say that during step 1 you lost tables, what do you mean
by "lost"? Were the rows of those tables still in .META.? Were the
regions stuck in transition (in the shell, do: status 'detailed')? Or
when you tried to query them you just got a TableNotFoundException?

Also, the fact that only -ROOT- and not .META. was on this region
server means that if there was any data lost, it would be .META.'s
location and it would have been assigned somewhere else (E), but still
stayed assigned on A. Since the data is in the memstore, recent data
couldn't be read by this second assignment of .META. but... it could
also be reassigned for a "normal" reason like rebalancing. The best
way to confirm that is when the -ROOT- region gets reassigned at the
end of step 1 (so this is after the message that goes like "...file
splitting completed in 80372..."), do so see something like this in
the master's log: "Current assignment of .META.,,some_timestamp  is
not valid; serverAddress=blah, startcode=blah unknown."? If so, then
it seems that data was lost and this is really unexpected.

J-D

On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu <er...@gmail.com> wrote:
> Hi again,
>
> I have re-checked our configuration to confirm that we have
> dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in logs. I
> inspected logs around log splits to find something, but i'm not sure that's
> what we are looking for. In the first step of the scenario i mentioned
> before (where we kill -9ed everything on the node that hosts the ROOT
> region), HLog says (stripping hdfs:// prefixes and hostnames for clarity)
>
> # Splitting 7 hlog(s) in .logs/F,60020,1287491528908
>
> Then it goes over every single one like
>
> # Splitting hlog 1 of 7
> # Splitting hlog 2 of 7
> # ...
> # Splitting hlog 7 of 7
>
> On the 7th hlog it WARNs with two lines
>
> # File .logs/F,60020,1287491528908/10.1.10.229%3A60020.1288021443546 might
> be still open, length is 0
> # Could not open .logs/F,60020,1287491528908/10.1.10.229%3A60020.1288021443546
> for reading. File is emptyjava.io.EOFException
>
> And completes with
>
> # log file splitting completed in 80372 millis for
> .logs/F,60020,1287491528908
>
> This might be it, but on the sixth step (where we kill -9ed the RegionServer
> that hosts the only META region), it splits 2 hlogs without any empty file
> problems nor any log above INFO, but as i told before, our testtable still
> got lost.
>
> I'll try to reproduce the problem in a cleaner way, but in the meantime, any
> kind of pointers to any problems we might have is greatly appreciated.
>
>
> On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu <er...@gmail.com>wrote:
>
>> Thanks for the answer.
>>
>> I am pretty sure we have dfs.support.append enabled. I remember both the
>> conf file and the logs, and don't recall seeing any errors on 60010. I
>> crawled through logs all yesterday but don't remember anything indicating a
>> specific error too. But i'm not sure about that. Let me check that and get
>> back here on monday.
>>
>> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>>
>>> First thing I'd check is if your configuration has dfs.support.append,
>>> you can confirm this by looking at your region server logs. When a RS
>>> starts, it creates an HLog and will print out: "Using syncFs --
>>> HDFS-200" if it's configured, else you'll see "syncFs -- HDFS-200 --
>>> not available, dfs.support.append=false". Also the master web ui (on
>>> port 60010) will print an error message regarding that.
>>>
>>> If it's all ok, then you should take a look at the master log when it
>>> does the log splitting and see if it contains any obvious errors.
>>>
>>> J-D
>>>
>>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu <er...@gmail.com>
>>> wrote:
>>> > Hi all,
>>> >
>>> > We have a testing cluster of 6 nodes which we try to run an
>>> HBase/MapReduce
>>> > application on. In order to simulate a power failure we kill -9ed all
>>> things
>>> > hadoop related on one of the slave nodes (DataNode, RegionServer,
>>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was on this
>>> node
>>> > too). We were expecting a smooth transition on all services but were
>>> unable
>>> > to get on HBase end. While our regions seemed intact (not confirmed), we
>>> > lost table definitions that pointed some kind of META region fail. So
>>> our
>>> > application failed with several TableNotFoundExceptions. Simulation was
>>> > conducted with no-load and extremely small data (like 10 rows in 3
>>> tables).
>>> >
>>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop
>>> > runs 0.20.3-append-r964955-1240, r960957. Most of the configuration
>>> > parameters are in default.
>>> >
>>> > If we did something wrong up to this point, please ignore the rest of
>>> the
>>> > message as i'll try to explain what we did to reproduce it and might be
>>> > irrelevant.
>>> >
>>> > Say the machines are named A, B, C, D, E, F; where A is master-like
>>> node,
>>> > others are slaves and power fail is on F. Since we have little data, we
>>> have
>>> > one ROOT and only one META region. I'll try to sum up the whole
>>> scenario.
>>> >
>>> > A: NN, DN, JT, TT, HM, RS
>>> > B: DN, TT, RS, ZK
>>> > C: DN, TT, RS, ZK
>>> > D: DN, TT, RS, ZK
>>> > E: DN, TT, RS, ZK
>>> > F: SNN, DN, TT, RS, ZK
>>> >
>>> > 0. Initial state -> ROOT: F, META: A
>>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited for about
>>> half
>>> > an hour to get nothing BTW
>>> > 2. Put F back online -> No effect
>>> > 3. Create a table 'testtable' to see if we lose it
>>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again
>>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again
>>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost
>>> 'testtable'
>>> > but get our tables from before the simulation. It seemed like because A
>>> had
>>> > META before the simulation, the table definitions were revived.
>>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost 2 out of
>>> our
>>> > original 6 tables, 'testtable' revived. That small data seems corrupted
>>> too
>>> > as our Scans don't finish.
>>> > 8. Run to mailing-list.
>>> >
>>> > First of all thanks for reading up to this point. From what we are now,
>>> we
>>> > are not even sure if this is the expected behavior, like if ROOT or META
>>> > region dies we lose data and must do sth like hbck, or if we are missing
>>> a
>>> > configuration, or if this is a bug. No need to mention that we are
>>> > relatively new to HBase so the last possibility is that if we didn't
>>> > understand it at all.
>>> >
>>> > Thanks in advance for any ideas.
>>> >
>>> > --
>>> > erdem agaoglu
>>> >
>>>
>>
>>
>>
>> --
>> erdem agaoglu
>>
>
>
>
> --
> erdem agaoglu
>