You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Kirill Shabunov <e2...@yahoo.com> on 2009/06/05 13:49:44 UTC
Trying 0.19.4 RC1
Hi!
First of all, thank you guys for your hard work!
I am testing this new RC.
1. I see some unprintable characters in the logs. I don't think it was
there before. I looked and found in
org.apache.hadoop.hbase.master.ProcessRegionOpen in line 82, LOG.info()
outputs "Bytes.toString(this.startCode)", which gives those non-printables.
Also, isn't it a bit misleading ProcessRegionOpen does not have it's own
Log instance and therefore it appears as
org.apache.hadoop.hbase.master.RegionServerOperation in the logs?
2. Now, a more serious thing. Everything worked fine for a day. For
instance, it nicely survived a "kill -9" on a region with both ROOT and
META. After the timeout the regions were reassigned and the cluster
resumed normal operation. However, shortly after another regionserver
was killed and the cluster rebounded I found my tables could not be read
because, for some regions, HBase thinks the data is located in wrong
files on DFS! Here is what the client says now when trying to get data
from one of the tables:
09/06/05 12:56:58 ERROR hbase.CHBaseTest: I/O error occured.
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server 192.168.37.26:60020 for region
log_trace,,1243520531393, row '', but failed after 10 attempts.
Exceptions:
java.io.IOException: java.io.IOException: HStoreScanner failed construction
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:68)
at
org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:92)
at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2134)
at
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:2000)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1187)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1714)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://namenode:9000/hbase/log_trace/1009406857/f/mapfiles/7948177727169613520/data
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
at
org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
at
org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:127)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:65)
... 10 more
At this time /hbase/log_trace/1009406857/f/mapfiles contains
"7416235210013649080" subdir with seemingly normal data, but no
"7948177727169613520" subdir.
Also, I see endless errors on the regionservers from
org.apache.hadoop.hbase.regionserver.CompactSplitThread complaining that
this or that file does not exist.
It all seems to start about this moment:
2009-06-04 11:10:30,816 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region log_activity,20090527221746113test746698248,1243481739735
2009-06-04 11:10:46,517 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region log_activity,20090527221746113test746698248,1243481739735 in 15sec
2009-06-04 11:10:46,517 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region log_data_exchange,20090528220725927test2142083448,1243570162509
2009-06-04 11:11:12,821 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region log_data_exchange,20090528220725927test2142083448,1243570162509
in 26sec
2009-06-04 11:11:12,822 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region
log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425
2009-06-04 11:11:33,899 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
..META.,,1: safeMode=false
2009-06-04 11:11:33,899 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
MSG_REGION_OPEN: .META.,,1: safeMode=false
2009-06-04 11:11:34,839 INFO
org.apache.hadoop.hbase.regionserver.HRegion: region
..META.,,1/1028785192 available
2009-06-04 11:11:44,874 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region
log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425
in 32sec
2009-06-04 11:11:44,875 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region log_data_exchange,20090529115336764test675370772,1243644061166
2009-06-04 11:11:44,881 WARN
org.apache.hadoop.hbase.regionserver.HStore: Failed with
java.io.FileNotFoundException: File does not exist:
hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data:
2095948567/f/554805660147274743
2009-06-04 11:11:44,885 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction/Split failed for region
log_data_exchange,20090529115336764test675370772,1243644061166
java.io.FileNotFoundException: File does not exist:
hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
at
org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
at
org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
at
org.apache.hadoop.hbase.regionserver.HStore.createReaders(HStore.java:1049)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:922)
at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:732)
at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:689)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:105)
2009-06-04 11:11:44,887 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region log_activity,20090527160652393test2101793466,1243459595085
2009-06-04 11:12:07,218 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region log_activity,20090527160652393test2101793466,1243459595085 in 22sec
--Kirill
Re: Trying 0.19.4 RC1
Posted by stack <st...@duboce.net>.
On Fri, Jun 5, 2009 at 4:49 AM, Kirill Shabunov <e2...@yahoo.com> wrote:
>
> 1. I see some unprintable characters in the logs. I don't think it was
> there before. I looked and found in
> org.apache.hadoop.hbase.master.ProcessRegionOpen in line 82, LOG.info()
> outputs "Bytes.toString(this.startCode)", which gives those non-printables.
Thanks.... More logging was added in 0.19.4RC1. I made your suggested fixes
in the branch.
2. Now, a more serious thing. Everything worked fine for a day. For
> instance, it nicely survived a "kill -9" on a region with both ROOT and
> META. After the timeout the regions were reassigned and the cluster resumed
> normal operation. However, shortly after another regionserver was killed and
> the cluster rebounded I found my tables could not be read because, for some
> regions, HBase thinks the data is located in wrong files on DFS! Here is
> what the client says now when trying to get data from one of the tables:
First, hbase can lose data on crash because HDFS does not have a working
flush. Sorry if we've not been clear that such is the case.
A working flush will not be available until 0.21.0 HADOOP, it seems (though
HADOOP-4379 is looking promising as a stopgap till 0.21.0 hadoop comes
around). We'll keep the list up to date as we learn more.
Whats happening below is covered in HBASE-1143. In short, regions are being
doubly assigned. On crash, the .META. has lost some edits so its picture as
to state of the cluster is off.
I'm going to put up 0.19.4RC2 in a minute. It has some fixes that should
help; there is the more frequent flushing mentioned in hbase-1143 but, as it
turns out, we could also bungle our application of edits after crash so even
if they weren't lost, they weren't being applied properly.
Thanks for testing the RC Kirill.
St.Ack
> 09/06/05 12:56:58 ERROR hbase.CHBaseTest: I/O error occured.
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server 192.168.37.26:60020 for region log_trace,,1243520531393, row
> '', but failed after 10 attempts.
> Exceptions:
> java.io.IOException: java.io.IOException: HStoreScanner failed construction
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:68)
> at
> org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:92)
> at
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2134)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:2000)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1187)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1714)
> at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> Caused by: java.io.FileNotFoundException: File does not exist:
> hdfs://namenode:9000/hbase/log_trace/1009406857/f/mapfiles/7948177727169613520/data
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
> at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
> at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
> at
> org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
> at
> org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
> at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
> at
> org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
> at
> org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
> at
> org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:127)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:65)
> ... 10 more
>
> At this time /hbase/log_trace/1009406857/f/mapfiles contains
> "7416235210013649080" subdir with seemingly normal data, but no
> "7948177727169613520" subdir.
>
> Also, I see endless errors on the regionservers from
> org.apache.hadoop.hbase.regionserver.CompactSplitThread complaining that
> this or that file does not exist.
>
> It all seems to start about this moment:
>
> 2009-06-04 11:10:30,816 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting compaction on region
> log_activity,20090527221746113test746698248,1243481739735
> 2009-06-04 11:10:46,517 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region
> log_activity,20090527221746113test746698248,1243481739735 in 15sec
> 2009-06-04 11:10:46,517 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting compaction on region
> log_data_exchange,20090528220725927test2142083448,1243570162509
> 2009-06-04 11:11:12,821 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region
> log_data_exchange,20090528220725927test2142083448,1243570162509 in 26sec
> 2009-06-04 11:11:12,822 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting compaction on region
> log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425
> 2009-06-04 11:11:33,899 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> ..META.,,1: safeMode=false
> 2009-06-04 11:11:33,899 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> .META.,,1: safeMode=false
> 2009-06-04 11:11:34,839 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region ..META.,,1/1028785192 available
> 2009-06-04 11:11:44,874 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region
> log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425
> in 32sec
> 2009-06-04 11:11:44,875 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting compaction on region
> log_data_exchange,20090529115336764test675370772,1243644061166
> 2009-06-04 11:11:44,881 WARN org.apache.hadoop.hbase.regionserver.HStore:
> Failed with java.io.FileNotFoundException: File does not exist:
> hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data:
> 2095948567/f/554805660147274743
> 2009-06-04 11:11:44,885 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
> failed for region
> log_data_exchange,20090529115336764test675370772,1243644061166
> java.io.FileNotFoundException: File does not exist:
> hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
> at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
> at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
> at
> org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
> at
> org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
> at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
> at
> org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
> at
> org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
> at
> org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
> at
> org.apache.hadoop.hbase.regionserver.HStore.createReaders(HStore.java:1049)
> at
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:922)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:732)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:689)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:105)
> 2009-06-04 11:11:44,887 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting compaction on region
> log_activity,20090527160652393test2101793466,1243459595085
> 2009-06-04 11:12:07,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region
> log_activity,20090527160652393test2101793466,1243459595085 in 22sec
>
>
> --Kirill
>