You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Kirill Shabunov <e2...@yahoo.com> on 2009/06/05 13:49:44 UTC

Trying 0.19.4 RC1

Hi!

First of all, thank you guys for your hard work!

I am testing this new RC.

1. I see some unprintable characters in the logs. I don't think it was 
there before. I looked and found in 
org.apache.hadoop.hbase.master.ProcessRegionOpen in line 82, LOG.info() 
outputs "Bytes.toString(this.startCode)", which gives those non-printables.

Also, isn't it a bit misleading ProcessRegionOpen does not have it's own 
Log instance and therefore it appears as 
org.apache.hadoop.hbase.master.RegionServerOperation in the logs?

2. Now, a more serious thing. Everything worked fine for a day. For 
instance, it nicely survived a "kill -9" on a region with both ROOT and 
META. After the timeout the regions were reassigned and the cluster 
resumed normal operation. However, shortly after another regionserver 
was killed and the cluster rebounded I found my tables could not be read 
because, for some regions, HBase thinks the data is located in wrong 
files on DFS! Here is what the client says now when trying to get data 
from one of the tables:

09/06/05 12:56:58 ERROR hbase.CHBaseTest: I/O error occured.
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to 
contact region server 192.168.37.26:60020 for region 
log_trace,,1243520531393, row '', but failed after 10 attempts.
Exceptions:
java.io.IOException: java.io.IOException: HStoreScanner failed construction
	at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:68)
	at 
org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:92)
	at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2134)
	at 
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:2000)
	at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1187)
	at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1714)
	at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
	at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Caused by: java.io.FileNotFoundException: File does not exist: 
hdfs://namenode:9000/hbase/log_trace/1009406857/f/mapfiles/7948177727169613520/data
	at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
	at 
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
	at 
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
	at 
org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
	at 
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
	at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
	at 
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
	at 
org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
	at 
org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
	at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:127)
	at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:65)
	... 10 more

At this time /hbase/log_trace/1009406857/f/mapfiles contains 
"7416235210013649080" subdir with seemingly normal data, but no 
"7948177727169613520" subdir.

Also, I see endless errors on the regionservers from 
org.apache.hadoop.hbase.regionserver.CompactSplitThread complaining that 
this or that file does not exist.

It all seems to start about this moment:

2009-06-04 11:10:30,816 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: starting  compaction on 
region log_activity,20090527221746113test746698248,1243481739735
2009-06-04 11:10:46,517 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on 
region log_activity,20090527221746113test746698248,1243481739735 in 15sec
2009-06-04 11:10:46,517 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: starting  compaction on 
region log_data_exchange,20090528220725927test2142083448,1243570162509
2009-06-04 11:11:12,821 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on 
region log_data_exchange,20090528220725927test2142083448,1243570162509 
in 26sec
2009-06-04 11:11:12,822 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: starting  compaction on 
region 
log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425
2009-06-04 11:11:33,899 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: 
..META.,,1: safeMode=false
2009-06-04 11:11:33,899 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: 
MSG_REGION_OPEN: .META.,,1: safeMode=false
2009-06-04 11:11:34,839 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: region 
..META.,,1/1028785192 available
2009-06-04 11:11:44,874 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on 
region 
log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425 
in 32sec
2009-06-04 11:11:44,875 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: starting  compaction on 
region log_data_exchange,20090529115336764test675370772,1243644061166
2009-06-04 11:11:44,881 WARN 
org.apache.hadoop.hbase.regionserver.HStore: Failed with 
java.io.FileNotFoundException: File does not exist: 
hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data: 
2095948567/f/554805660147274743
2009-06-04 11:11:44,885 ERROR 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: 
Compaction/Split failed for region 
log_data_exchange,20090529115336764test675370772,1243644061166
java.io.FileNotFoundException: File does not exist: 
hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data
	at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
	at 
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
	at 
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
	at 
org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
	at 
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
	at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
	at 
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
	at 
org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
	at 
org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
	at 
org.apache.hadoop.hbase.regionserver.HStore.createReaders(HStore.java:1049)
	at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:922)
	at 
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:732)
	at 
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:689)
	at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:105)
2009-06-04 11:11:44,887 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: starting  compaction on 
region log_activity,20090527160652393test2101793466,1243459595085
2009-06-04 11:12:07,218 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on 
region log_activity,20090527160652393test2101793466,1243459595085 in 22sec


--Kirill

Re: Trying 0.19.4 RC1

Posted by stack <st...@duboce.net>.
On Fri, Jun 5, 2009 at 4:49 AM, Kirill Shabunov <e2...@yahoo.com> wrote:

>
> 1. I see some unprintable characters in the logs. I don't think it was
> there before. I looked and found in
> org.apache.hadoop.hbase.master.ProcessRegionOpen in line 82, LOG.info()
> outputs "Bytes.toString(this.startCode)", which gives those non-printables.



Thanks.... More logging was added in 0.19.4RC1.  I made your suggested fixes
in the branch.


2. Now, a more serious thing. Everything worked fine for a day. For
> instance, it nicely survived a "kill -9" on a region with both ROOT and
> META. After the timeout the regions were reassigned and the cluster resumed
> normal operation. However, shortly after another regionserver was killed and
> the cluster rebounded I found my tables could not be read because, for some
> regions, HBase thinks the data is located in wrong files on DFS! Here is
> what the client says now when trying to get data from one of the tables:



First, hbase can lose data on crash because HDFS does not have a working
flush.  Sorry if we've not been clear that such is the case.

A working flush will not be available until 0.21.0 HADOOP, it seems (though
HADOOP-4379 is looking promising as a stopgap till 0.21.0 hadoop comes
around).  We'll keep the list up to date as we learn more.

Whats happening below is covered in HBASE-1143.  In short, regions are being
doubly assigned.  On crash, the .META. has lost some edits so its picture as
to state of the cluster is off.

I'm going to put up 0.19.4RC2 in a minute.  It has some fixes that should
help; there is the more frequent flushing mentioned in hbase-1143 but, as it
turns out, we could also bungle our application of edits after crash so even
if they weren't lost, they weren't being applied properly.

Thanks for testing the RC Kirill.

St.Ack


> 09/06/05 12:56:58 ERROR hbase.CHBaseTest: I/O error occured.
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server 192.168.37.26:60020 for region log_trace,,1243520531393, row
> '', but failed after 10 attempts.
> Exceptions:
> java.io.IOException: java.io.IOException: HStoreScanner failed construction
>        at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:68)
>        at
> org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:92)
>        at
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2134)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:2000)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1187)
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1714)
>        at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> Caused by: java.io.FileNotFoundException: File does not exist:
> hdfs://namenode:9000/hbase/log_trace/1009406857/f/mapfiles/7948177727169613520/data
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
>        at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
>        at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
>        at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
>        at
> org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
>        at
> org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
>        at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
>        at
> org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
>        at
> org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
>        at
> org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
>        at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:127)
>        at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:65)
>        ... 10 more
>
> At this time /hbase/log_trace/1009406857/f/mapfiles contains
> "7416235210013649080" subdir with seemingly normal data, but no
> "7948177727169613520" subdir.
>
> Also, I see endless errors on the regionservers from
> org.apache.hadoop.hbase.regionserver.CompactSplitThread complaining that
> this or that file does not exist.
>
> It all seems to start about this moment:
>
> 2009-06-04 11:10:30,816 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region
> log_activity,20090527221746113test746698248,1243481739735
> 2009-06-04 11:10:46,517 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region
> log_activity,20090527221746113test746698248,1243481739735 in 15sec
> 2009-06-04 11:10:46,517 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region
> log_data_exchange,20090528220725927test2142083448,1243570162509
> 2009-06-04 11:11:12,821 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region
> log_data_exchange,20090528220725927test2142083448,1243570162509 in 26sec
> 2009-06-04 11:11:12,822 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region
> log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425
> 2009-06-04 11:11:33,899 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> ..META.,,1: safeMode=false
> 2009-06-04 11:11:33,899 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> .META.,,1: safeMode=false
> 2009-06-04 11:11:34,839 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region ..META.,,1/1028785192 available
> 2009-06-04 11:11:44,874 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region
> log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425
> in 32sec
> 2009-06-04 11:11:44,875 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region
> log_data_exchange,20090529115336764test675370772,1243644061166
> 2009-06-04 11:11:44,881 WARN org.apache.hadoop.hbase.regionserver.HStore:
> Failed with java.io.FileNotFoundException: File does not exist:
> hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data:
> 2095948567/f/554805660147274743
> 2009-06-04 11:11:44,885 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
> failed for region
> log_data_exchange,20090529115336764test675370772,1243644061166
> java.io.FileNotFoundException: File does not exist:
> hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
>        at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
>        at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
>        at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
>        at
> org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
>        at
> org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
>        at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
>        at
> org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
>        at
> org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
>        at
> org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
>        at
> org.apache.hadoop.hbase.regionserver.HStore.createReaders(HStore.java:1049)
>        at
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:922)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:732)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:689)
>        at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:105)
> 2009-06-04 11:11:44,887 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region
> log_activity,20090527160652393test2101793466,1243459595085
> 2009-06-04 11:12:07,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region
> log_activity,20090527160652393test2101793466,1243459595085 in 22sec
>
>
> --Kirill
>