You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Shankar hiremath <sh...@huawei.com> on 2014/07/25 16:08:25 UTC

HBase file encryption, inconsistencies observed and data loss

HBase file encryption some inconsistencies observed and data loss happens after running the hbck tool,
the operation steps are as below.    (one thing what I observed is, on startup of HMaster if it is not able to process the WAL file, then also it moved to /oldWALs)

Procedure:
1. Start the Hbase services (HMaster & region Server)
2. Enable HFile encryption and WAL file encryption as below, and perform 'table4-0' put operations (100 records added)
<property>
<name>hbase.crypto.keyprovider</name>
<value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
<name>hbase.crypto.keyprovider.parameters</name>
<value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
</property>
<property>
<name>hbase.crypto.master.key.name</name>
<value>hdfs</value>
</property>
<property>
<name>hfile.format.version</name>
<value>3</value>
</property>
<property>
<name>hbase.regionserver.hlog.reader.impl</name>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
<name>hbase.regionserver.hlog.writer.impl</name>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
<name>hbase.regionserver.wal.encryption</name>
<value>true</value>
</property>
3. Machine went down, so all process went down

4. We disabled the WAL file encryption for performance reason, and keep encryption only for Hfile, as below
<property>
<name>hbase.crypto.keyprovider</name>
<value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
<name>hbase.crypto.keyprovider.parameters</name>
<value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
</property>
<property>
<name>hbase.crypto.master.key.name</name>
<value>hdfs</value>
</property>
<property>
<name>hfile.format.version</name>
<value>3</value>
</property>
5. Start the Region Server and query the 'table4-0' data
hbase(main):003:0> count 'table4-0'
ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online on XX-XX-XX-XX,60020,1406209023146
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
at java.lang.Thread.run(Thread.java:662)
6. Not able to read the data, so we decided to revert back the configuration (as original)
7. Kill/Stop the Region Server, revert all the configurations as original, as below
<property>
<name>hbase.crypto.keyprovider</name>
<value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
<name>hbase.crypto.keyprovider.parameters</name>
<value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
</property>
<property>
<name>hbase.crypto.master.key.name</name>
<value>hdfs</value>
</property>
<property>
<name>hfile.format.version</name>
<value>3</value>
</property>
<property>
<name>hbase.regionserver.hlog.reader.impl</name>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
<name>hbase.regionserver.hlog.writer.impl</name>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
<name>hbase.regionserver.wal.encryption</name>
<value>true</value>
</property>
7. Start the Region Server, and perform the 'table4-0' query
hbase(main):003:0> count 'table4-0'
ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online on XX-XX-XX-XX,60020,1406209023146
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
at java.lang.Thread.run(Thread.java:662)
8. Run the hbase hbck to repair, as below
./hbase hbck -details
.........................
Summary:
table1-0 is okay.
Number of regions: 0
Deployed on:
table2-0 is okay.
Number of regions: 0
Deployed on:
table3-0 is okay.
Number of regions: 0
Deployed on:
table4-0 is okay.
Number of regions: 0
Deployed on:
table5-0 is okay.
Number of regions: 0
Deployed on:
table6-0 is okay.
Number of regions: 0
Deployed on:
table7-0 is okay.
Number of regions: 0
Deployed on:
table8-0 is okay.
Number of regions: 0
Deployed on:
table9-0 is okay.
Number of regions: 0
Deployed on:
hbase:meta is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
hbase:acl is okay.
Number of regions: 0
Deployed on:
hbase:namespace is okay.
Number of regions: 0
Deployed on:
22 inconsistencies detected.
Status: INCONSISTENT
2014-07-24 19:13:05,532 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2014-07-24 19:13:05,533 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1475d1611611bcf
2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x1475d1611611bcf
2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing client for session: 0x1475d1611611bcf
2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf, packet:: clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader:: 6,4295102074,0 request:: null response:: null
2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session: 0x1475d1611611bcf
2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: An exception was thrown while closing send thread for session 0x1475d1611611bcf : Unable to read additional data from server sessionid 0x1475d1611611bcf, likely server has closed socket
2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session: 0x1475d1611611bcf closed
shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
9. Fix the assignments as below
./hbase hbck -fixAssignments
Summary:
table1-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table2-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table3-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table4-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table5-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table6-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table7-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table8-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table9-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
hbase:meta is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
hbase:acl is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
hbase:namespace is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
0 inconsistencies detected.
Status: OK
2014-07-24 19:44:55,194 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2014-07-24 19:44:55,194 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x2475d15f7b31b73
2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x2475d15f7b31b73
2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing client for session: 0x2475d15f7b31b73
2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73, packet:: clientPath:null serverPath:null finished:false header:: 7,-11 replyHeader:: 7,4295102377,0 request:: null response:: null
2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session: 0x2475d15f7b31b73
2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: An exception was thrown while closing send thread for session 0x2475d15f7b31b73 : Unable to read additional data from server sessionid 0x2475d15f7b31b73, likely server has closed socket
2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session: 0x2475d15f7b31b73 closed
2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
10. Fix the assignments as below
./hbase hbck -fixAssignments -fixMeta
Summary:
table1-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table2-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table3-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table4-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table5-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table6-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table7-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table8-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
table9-0 is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
hbase:meta is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
hbase:acl is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
hbase:namespace is okay.
Number of regions: 1
Deployed on: XX-XX-XX-XX,60020,1406209023146
0 inconsistencies detected.
Status: OK
2014-07-24 19:46:16,290 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2014-07-24 19:46:16,290 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3475d1605321be9
2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x3475d1605321be9
2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing client for session: 0x3475d1605321be9
2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9, packet:: clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader:: 6,4295102397,0 request:: null response:: null
2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session: 0x3475d1605321be9
2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: An exception was thrown while closing send thread for session 0x3475d1605321be9 : Unable to read additional data from server sessionid 0x3475d1605321be9, likely server has closed socket
2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session: 0x3475d1605321be9 closed
2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
hbase(main):006:0> count 'table4-0'
0 row(s) in 0.0200 seconds
=> 0
hbase(main):007:0>
Complete data loss happened,
WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data



[X]
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
[X]

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Ted Yu <yu...@gmail.com>.

I logged HBASE-11620 for this issue.

If my proposal is accepted, I can provide a patch.

Cheers


On Wed, Jul 30, 2014 at 12:56 PM, Andrew Purtell <ap...@apache.org>
wrote:

> Let's take this to JIRA
>
>
> On Wed, Jul 30, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > In BaseDecoder#rethrowEofException() :
> >
> >     if (!isEof) throw ioEx;
> >
> >     LOG.error("Partial cell read caused by EOF: " + ioEx);
> >
> >     EOFException eofEx = new EOFException("Partial cell read");
> >
> >     eofEx.initCause(ioEx);
> >
> >     throw eofEx;
> >
> > throwing EOFException would not propagate the "Partial cell read"
> condition
> > to HLogSplitter which doesn't treat EOFException as an error.
> >
> > I think a new exception type (DecoderException e.g.) should be used
> above.
> >
> > Cheers
> >
> >
> > On Wed, Jul 30, 2014 at 10:22 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Looking at HLogSplitter#getNextLogLine() :
> > >
> > >     try {
> > >
> > >       return in.next();
> > >
> > >     } catch (EOFException eof) {
> > >
> > >       // truncated files are expected if a RS crashes (see HBASE-2643)
> > >
> > >       LOG.info("EOF from hlog " + path + ".  continuing");
> > >
> > >       return null;
> > >
> > > The EOFException is not treated as an error. But the posted log doesn't
> > > contain "EOF from hlog " - there may be other code path leading to
> > > codec.BaseDecoder
> > >
> > > Cheers
> > >
> > >
> > > On Wed, Jul 30, 2014 at 9:20 AM, Kiran Kumar.M.R <
> > > Kiran.Kumar.MR@huawei.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> After step 4 ( i.e disabling of WAL encryption, removing
> > >> SecureProtobufReader/Writer and restart), read of encrypted WAL fails
> > >> mainly due to EOF exception at Basedecoder. This is not considered as
> > error
> > >> and these WAL are being moved to /oldWALs.
> > >>
> > >> Following is observed in log files:
> > >> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> > >> wal.HLogSplitter: Splitting hlog:
> > >>
> >
> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017,
> > >> length=172
> > >> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> > >> wal.HLogSplitter: DistributedLogReplay = false
> > >> 2014-07-30 19:44:29,313 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> > >> util.FSHDFSUtils: Recovering lease on dfs file
> > >>
> >
> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> > >> 2014-07-30 19:44:29,315 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> > >> util.FSHDFSUtils: recoverLease=true, attempt=0 on
> > >>
> >
> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> > >> after 1ms
> > >> 2014-07-30 19:44:29,429 DEBUG
> > >> [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0] wal.HLogSplitter: Writer
> > >> thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0,5,main]:
> > starting
> > >> 2014-07-30 19:44:29,429 DEBUG
> > >> [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1] wal.HLogSplitter: Writer
> > >> thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1,5,main]:
> > starting
> > >> 2014-07-30 19:44:29,430 DEBUG
> > >> [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2] wal.HLogSplitter: Writer
> > >> thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2,5,main]:
> > starting
> > >> 2014-07-30 19:44:29,591 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> > >> codec.BaseDecoder: Partial cell read caused by EOF:
> java.io.IOException:
> > >> Premature EOF from inputStream
> > >> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> > >> wal.HLogSplitter: Finishing writing output logs and closing down.
> > >> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> > >> wal.HLogSplitter: Waiting for split writer threads to finish
> > >> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> > >> wal.HLogSplitter: Split writers finished
> > >> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> > >> wal.HLogSplitter: Processed 0 edits across 0 regions; log
> > >>
> >
> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> > >> is corrupted = false progress failed = false
> > >>
> > >> To fix this, we need to propagate EOF exception to HLogSplitter. Any
> > >> suggestions on the fix?
> > >>
> > >>
> > >> Regards,
> > >> Kiran
> > >>
> > >>
> >
> __________________________________________________________________________________________________________
> > >> This e-mail and its attachments contain confidential information from
> > >> HUAWEI, which is intended only for the person or entity whose address
> is
> > >> listed above. Any use of the information contained herein in any way
> > >> (including, but not limited to, total or partial disclosure,
> > reproduction,
> > >> or dissemination) by persons other than the intended recipient(s) is
> > >> prohibited. If you receive this e-mail in error, please notify the
> > sender
> > >> by phone or email immediately and delete it!
> > >>
> > >>
> >
> __________________________________________________________________________________________________________
> > >>
> > >>
> > >>
> > >>
> > >> > -----Original Message-----
> > >> > From: Shankar hiremath [mailto:shankar.hiremath@huawei.com]
> > >> > Sent: Friday, July 25, 2014 19:38
> > >> > To: user@hbase.apache.org
> > >> > Subject: HBase file encryption, inconsistencies observed and data
> loss
> > >> >
> > >> > HBase file encryption some inconsistencies observed and data loss
> > >> > happens after running the hbck tool,
> > >> > the operation steps are as below.    (one thing what I observed is,
> on
> > >> > startup of HMaster if it is not able to process the WAL file, then
> > also
> > >> > it moved to /oldWALs)
> > >> >
> > >> > Procedure:
> > >> > 1. Start the Hbase services (HMaster & region Server) 2. Enable
> HFile
> > >> > encryption and WAL file encryption as below, and perform 'table4-0'
> > put
> > >> > operations (100 records added) <property>
> > >> > <name>hbase.crypto.keyprovider</name>
> > >> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.crypto.keyprovider.parameters</name>
> > >> >
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> > </
> > >> > value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.crypto.master.key.name</name>
> > >> > <value>hdfs</value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hfile.format.version</name>
> > >> > <value>3</value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.regionserver.hlog.reader.impl</name>
> > >> >
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> > >> > </value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.regionserver.hlog.writer.impl</name>
> > >> >
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> > >> > </value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.regionserver.wal.encryption</name>
> > >> > <value>true</value>
> > >> > </property>
> > >> > 3. Machine went down, so all process went down
> > >> >
> > >> > 4. We disabled the WAL file encryption for performance reason, and
> > keep
> > >> > encryption only for Hfile, as below <property>
> > >> > <name>hbase.crypto.keyprovider</name>
> > >> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.crypto.keyprovider.parameters</name>
> > >> >
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> > </
> > >> > value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.crypto.master.key.name</name>
> > >> > <value>hdfs</value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hfile.format.version</name>
> > >> > <value>3</value>
> > >> > </property>
> > >> > 5. Start the Region Server and query the 'table4-0' data
> > >> > hbase(main):003:0> count 'table4-0'
> > >> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> > >> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> > online
> > >> > on XX-XX-XX-XX,60020,1406209023146 at
> > >> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> > >> > me(HRegionServer.java:2685)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > >> > ver.java:4119)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> > >> > ava:3066)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> > >> > .callBlockingMethod(ClientProtos.java:29497)
> > >> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> > >> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> > >> > heduler.java:168)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> > >> > duler.java:39)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> > >> > .java:111)
> > >> > at java.lang.Thread.run(Thread.java:662)
> > >> > 6. Not able to read the data, so we decided to revert back the
> > >> > configuration (as original) 7. Kill/Stop the Region Server, revert
> all
> > >> > the configurations as original, as below <property>
> > >> > <name>hbase.crypto.keyprovider</name>
> > >> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.crypto.keyprovider.parameters</name>
> > >> >
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> > </
> > >> > value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.crypto.master.key.name</name>
> > >> > <value>hdfs</value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hfile.format.version</name>
> > >> > <value>3</value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.regionserver.hlog.reader.impl</name>
> > >> >
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> > >> > </value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.regionserver.hlog.writer.impl</name>
> > >> >
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> > >> > </value>
> > >> > </property>
> > >> > <property>
> > >> > <name>hbase.regionserver.wal.encryption</name>
> > >> > <value>true</value>
> > >> > </property>
> > >> > 7. Start the Region Server, and perform the 'table4-0' query
> > >> > hbase(main):003:0> count 'table4-0'
> > >> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> > >> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> > online
> > >> > on XX-XX-XX-XX,60020,1406209023146 at
> > >> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> > >> > me(HRegionServer.java:2685)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > >> > ver.java:4119)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> > >> > ava:3066)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> > >> > .callBlockingMethod(ClientProtos.java:29497)
> > >> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> > >> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> > >> > heduler.java:168)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> > >> > duler.java:39)
> > >> > at
> > >> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> > >> > .java:111)
> > >> > at java.lang.Thread.run(Thread.java:662)
> > >> > 8. Run the hbase hbck to repair, as below ./hbase hbck -
> > >> > details .........................
> > >> > Summary:
> > >> > table1-0 is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > table2-0 is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > table3-0 is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > table4-0 is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > table5-0 is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > table6-0 is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > table7-0 is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > table8-0 is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > table9-0 is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > hbase:meta is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > hbase:namespace is okay.
> > >> > Number of regions: 0
> > >> > Deployed on:
> > >> > 22 inconsistencies detected.
> > >> > Status: INCONSISTENT
> > >> > 2014-07-24 19:13:05,532 INFO [main]
> > >> > client.HConnectionManager$HConnectionImplementation: Closing master
> > >> > protocol: MasterService
> > >> > 2014-07-24 19:13:05,533 INFO [main]
> > >> > client.HConnectionManager$HConnectionImplementation: Closing
> zookeeper
> > >> > sessionid=0x1475d1611611bcf
> > >> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
> > >> > session: 0x1475d1611611bcf
> > >> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
> > >> > client for session: 0x1475d1611611bcf
> > >> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >> > zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
> > >> > packet:: clientPath:null serverPath:null finished:false header::
> 6,-11
> > >> > replyHeader:: 6,4295102074,0 request:: null response:: null
> > >> > 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
> > >> > Disconnecting client for session: 0x1475d1611611bcf
> > >> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >> > zookeeper.ClientCnxn: An exception was thrown while closing send
> > thread
> > >> > for session 0x1475d1611611bcf : Unable to read additional data from
> > >> > server sessionid 0x1475d1611611bcf, likely server has closed socket
> > >> > 2014-07-24 19:13:05,546 INFO [main-EventThread]
> zookeeper.ClientCnxn:
> > >> > EventThread shut down
> > >> > 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> > >> > 0x1475d1611611bcf closed shankar1@XX-XX-XX-XX
> :~/DataSight/hbase/bin>
> > >> > 9. Fix the assignments as below
> > >> > ./hbase hbck -fixAssignments
> > >> > Summary:
> > >> > table1-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table2-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table3-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table4-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table5-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table6-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table7-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table8-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table9-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is
> okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > 0 inconsistencies detected.
> > >> > Status: OK
> > >> > 2014-07-24 19:44:55,194 INFO [main]
> > >> > client.HConnectionManager$HConnectionImplementation: Closing master
> > >> > protocol: MasterService
> > >> > 2014-07-24 19:44:55,194 INFO [main]
> > >> > client.HConnectionManager$HConnectionImplementation: Closing
> zookeeper
> > >> > sessionid=0x2475d15f7b31b73
> > >> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
> > >> > session: 0x2475d15f7b31b73
> > >> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
> > >> > client for session: 0x2475d15f7b31b73
> > >> > 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >> > zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
> > >> > packet:: clientPath:null serverPath:null finished:false header::
> 7,-11
> > >> > replyHeader:: 7,4295102377,0 request:: null response:: null
> > >> > 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
> > >> > Disconnecting client for session: 0x2475d15f7b31b73
> > >> > 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >> > zookeeper.ClientCnxn: An exception was thrown while closing send
> > thread
> > >> > for session 0x2475d15f7b31b73 : Unable to read additional data from
> > >> > server sessionid 0x2475d15f7b31b73, likely server has closed socket
> > >> > 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> > >> > 0x2475d15f7b31b73 closed
> > >> > 2014-07-24 19:44:55,204 INFO [main-EventThread]
> zookeeper.ClientCnxn:
> > >> > EventThread shut down 10. Fix the assignments as below ./hbase hbck
> -
> > >> > fixAssignments -fixMeta
> > >> > Summary:
> > >> > table1-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table2-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table3-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table4-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table5-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table6-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table7-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table8-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > table9-0 is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is
> okay.
> > >> > Number of regions: 1
> > >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >> > 0 inconsistencies detected.
> > >> > Status: OK
> > >> > 2014-07-24 19:46:16,290 INFO [main]
> > >> > client.HConnectionManager$HConnectionImplementation: Closing master
> > >> > protocol: MasterService
> > >> > 2014-07-24 19:46:16,290 INFO [main]
> > >> > client.HConnectionManager$HConnectionImplementation: Closing
> zookeeper
> > >> > sessionid=0x3475d1605321be9
> > >> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
> > >> > session: 0x3475d1605321be9
> > >> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
> > >> > client for session: 0x3475d1605321be9
> > >> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >> > zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
> > >> > packet:: clientPath:null serverPath:null finished:false header::
> 6,-11
> > >> > replyHeader:: 6,4295102397,0 request:: null response:: null
> > >> > 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
> > >> > Disconnecting client for session: 0x3475d1605321be9
> > >> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >> > zookeeper.ClientCnxn: An exception was thrown while closing send
> > thread
> > >> > for session 0x3475d1605321be9 : Unable to read additional data from
> > >> > server sessionid 0x3475d1605321be9, likely server has closed socket
> > >> > 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> > >> > 0x3475d1605321be9 closed
> > >> > 2014-07-24 19:46:16,300 INFO [main-EventThread]
> zookeeper.ClientCnxn:
> > >> > EventThread shut down hbase(main):006:0> count 'table4-0'
> > >> > 0 row(s) in 0.0200 seconds
> > >> > => 0
> > >> > hbase(main):007:0>
> > >> > Complete data loss happened,
> > >> > WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
> > >> >
> > >> >
> > >> >
> > >> > [X]
> > >> > This e-mail and its attachments contain confidential information
> from
> > >> > HUAWEI, which is intended only for the person or entity whose
> address
> > >> > is listed above. Any use of the information contained herein in any
> > way
> > >> > (including, but not limited to, total or partial disclosure,
> > >> > reproduction, or dissemination) by persons other than the intended
> > >> > recipient(s) is prohibited. If you receive this e-mail in error,
> > please
> > >> > notify the sender by phone or email immediately and delete it!
> > >> > [X]
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> > >>
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Andrew Purtell <ap...@apache.org>.

Let's take this to JIRA


On Wed, Jul 30, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:

> In BaseDecoder#rethrowEofException() :
>
>     if (!isEof) throw ioEx;
>
>     LOG.error("Partial cell read caused by EOF: " + ioEx);
>
>     EOFException eofEx = new EOFException("Partial cell read");
>
>     eofEx.initCause(ioEx);
>
>     throw eofEx;
>
> throwing EOFException would not propagate the "Partial cell read" condition
> to HLogSplitter which doesn't treat EOFException as an error.
>
> I think a new exception type (DecoderException e.g.) should be used above.
>
> Cheers
>
>
> On Wed, Jul 30, 2014 at 10:22 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Looking at HLogSplitter#getNextLogLine() :
> >
> >     try {
> >
> >       return in.next();
> >
> >     } catch (EOFException eof) {
> >
> >       // truncated files are expected if a RS crashes (see HBASE-2643)
> >
> >       LOG.info("EOF from hlog " + path + ".  continuing");
> >
> >       return null;
> >
> > The EOFException is not treated as an error. But the posted log doesn't
> > contain "EOF from hlog " - there may be other code path leading to
> > codec.BaseDecoder
> >
> > Cheers
> >
> >
> > On Wed, Jul 30, 2014 at 9:20 AM, Kiran Kumar.M.R <
> > Kiran.Kumar.MR@huawei.com> wrote:
> >
> >> Hi,
> >>
> >> After step 4 ( i.e disabling of WAL encryption, removing
> >> SecureProtobufReader/Writer and restart), read of encrypted WAL fails
> >> mainly due to EOF exception at Basedecoder. This is not considered as
> error
> >> and these WAL are being moved to /oldWALs.
> >>
> >> Following is observed in log files:
> >> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> >> wal.HLogSplitter: Splitting hlog:
> >>
> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017,
> >> length=172
> >> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> >> wal.HLogSplitter: DistributedLogReplay = false
> >> 2014-07-30 19:44:29,313 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> >> util.FSHDFSUtils: Recovering lease on dfs file
> >>
> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> >> 2014-07-30 19:44:29,315 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> >> util.FSHDFSUtils: recoverLease=true, attempt=0 on
> >>
> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> >> after 1ms
> >> 2014-07-30 19:44:29,429 DEBUG
> >> [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0] wal.HLogSplitter: Writer
> >> thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0,5,main]:
> starting
> >> 2014-07-30 19:44:29,429 DEBUG
> >> [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1] wal.HLogSplitter: Writer
> >> thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1,5,main]:
> starting
> >> 2014-07-30 19:44:29,430 DEBUG
> >> [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2] wal.HLogSplitter: Writer
> >> thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2,5,main]:
> starting
> >> 2014-07-30 19:44:29,591 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> >> codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
> >> Premature EOF from inputStream
> >> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> >> wal.HLogSplitter: Finishing writing output logs and closing down.
> >> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> >> wal.HLogSplitter: Waiting for split writer threads to finish
> >> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> >> wal.HLogSplitter: Split writers finished
> >> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> >> wal.HLogSplitter: Processed 0 edits across 0 regions; log
> >>
> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> >> is corrupted = false progress failed = false
> >>
> >> To fix this, we need to propagate EOF exception to HLogSplitter. Any
> >> suggestions on the fix?
> >>
> >>
> >> Regards,
> >> Kiran
> >>
> >>
> __________________________________________________________________________________________________________
> >> This e-mail and its attachments contain confidential information from
> >> HUAWEI, which is intended only for the person or entity whose address is
> >> listed above. Any use of the information contained herein in any way
> >> (including, but not limited to, total or partial disclosure,
> reproduction,
> >> or dissemination) by persons other than the intended recipient(s) is
> >> prohibited. If you receive this e-mail in error, please notify the
> sender
> >> by phone or email immediately and delete it!
> >>
> >>
> __________________________________________________________________________________________________________
> >>
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: Shankar hiremath [mailto:shankar.hiremath@huawei.com]
> >> > Sent: Friday, July 25, 2014 19:38
> >> > To: user@hbase.apache.org
> >> > Subject: HBase file encryption, inconsistencies observed and data loss
> >> >
> >> > HBase file encryption some inconsistencies observed and data loss
> >> > happens after running the hbck tool,
> >> > the operation steps are as below.    (one thing what I observed is, on
> >> > startup of HMaster if it is not able to process the WAL file, then
> also
> >> > it moved to /oldWALs)
> >> >
> >> > Procedure:
> >> > 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
> >> > encryption and WAL file encryption as below, and perform 'table4-0'
> put
> >> > operations (100 records added) <property>
> >> > <name>hbase.crypto.keyprovider</name>
> >> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.crypto.keyprovider.parameters</name>
> >> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </
> >> > value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.crypto.master.key.name</name>
> >> > <value>hdfs</value>
> >> > </property>
> >> > <property>
> >> > <name>hfile.format.version</name>
> >> > <value>3</value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.regionserver.hlog.reader.impl</name>
> >> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> >> > </value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.regionserver.hlog.writer.impl</name>
> >> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> >> > </value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.regionserver.wal.encryption</name>
> >> > <value>true</value>
> >> > </property>
> >> > 3. Machine went down, so all process went down
> >> >
> >> > 4. We disabled the WAL file encryption for performance reason, and
> keep
> >> > encryption only for Hfile, as below <property>
> >> > <name>hbase.crypto.keyprovider</name>
> >> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.crypto.keyprovider.parameters</name>
> >> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </
> >> > value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.crypto.master.key.name</name>
> >> > <value>hdfs</value>
> >> > </property>
> >> > <property>
> >> > <name>hfile.format.version</name>
> >> > <value>3</value>
> >> > </property>
> >> > 5. Start the Region Server and query the 'table4-0' data
> >> > hbase(main):003:0> count 'table4-0'
> >> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> >> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> online
> >> > on XX-XX-XX-XX,60020,1406209023146 at
> >> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> >> > me(HRegionServer.java:2685)
> >> > at
> >> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> >> > ver.java:4119)
> >> > at
> >> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> >> > ava:3066)
> >> > at
> >> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> >> > .callBlockingMethod(ClientProtos.java:29497)
> >> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> >> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> >> > at
> >> >
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> >> > heduler.java:168)
> >> > at
> >> >
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> >> > duler.java:39)
> >> > at
> >> >
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> >> > .java:111)
> >> > at java.lang.Thread.run(Thread.java:662)
> >> > 6. Not able to read the data, so we decided to revert back the
> >> > configuration (as original) 7. Kill/Stop the Region Server, revert all
> >> > the configurations as original, as below <property>
> >> > <name>hbase.crypto.keyprovider</name>
> >> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.crypto.keyprovider.parameters</name>
> >> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </
> >> > value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.crypto.master.key.name</name>
> >> > <value>hdfs</value>
> >> > </property>
> >> > <property>
> >> > <name>hfile.format.version</name>
> >> > <value>3</value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.regionserver.hlog.reader.impl</name>
> >> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> >> > </value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.regionserver.hlog.writer.impl</name>
> >> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> >> > </value>
> >> > </property>
> >> > <property>
> >> > <name>hbase.regionserver.wal.encryption</name>
> >> > <value>true</value>
> >> > </property>
> >> > 7. Start the Region Server, and perform the 'table4-0' query
> >> > hbase(main):003:0> count 'table4-0'
> >> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> >> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> online
> >> > on XX-XX-XX-XX,60020,1406209023146 at
> >> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> >> > me(HRegionServer.java:2685)
> >> > at
> >> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> >> > ver.java:4119)
> >> > at
> >> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> >> > ava:3066)
> >> > at
> >> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> >> > .callBlockingMethod(ClientProtos.java:29497)
> >> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> >> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> >> > at
> >> >
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> >> > heduler.java:168)
> >> > at
> >> >
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> >> > duler.java:39)
> >> > at
> >> >
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> >> > .java:111)
> >> > at java.lang.Thread.run(Thread.java:662)
> >> > 8. Run the hbase hbck to repair, as below ./hbase hbck -
> >> > details .........................
> >> > Summary:
> >> > table1-0 is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > table2-0 is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > table3-0 is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > table4-0 is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > table5-0 is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > table6-0 is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > table7-0 is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > table8-0 is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > table9-0 is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > hbase:meta is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > hbase:namespace is okay.
> >> > Number of regions: 0
> >> > Deployed on:
> >> > 22 inconsistencies detected.
> >> > Status: INCONSISTENT
> >> > 2014-07-24 19:13:05,532 INFO [main]
> >> > client.HConnectionManager$HConnectionImplementation: Closing master
> >> > protocol: MasterService
> >> > 2014-07-24 19:13:05,533 INFO [main]
> >> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> >> > sessionid=0x1475d1611611bcf
> >> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
> >> > session: 0x1475d1611611bcf
> >> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
> >> > client for session: 0x1475d1611611bcf
> >> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >> > zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
> >> > packet:: clientPath:null serverPath:null finished:false header:: 6,-11
> >> > replyHeader:: 6,4295102074,0 request:: null response:: null
> >> > 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
> >> > Disconnecting client for session: 0x1475d1611611bcf
> >> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >> > zookeeper.ClientCnxn: An exception was thrown while closing send
> thread
> >> > for session 0x1475d1611611bcf : Unable to read additional data from
> >> > server sessionid 0x1475d1611611bcf, likely server has closed socket
> >> > 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
> >> > EventThread shut down
> >> > 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> >> > 0x1475d1611611bcf closed shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> >> > 9. Fix the assignments as below
> >> > ./hbase hbck -fixAssignments
> >> > Summary:
> >> > table1-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table2-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table3-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table4-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table5-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table6-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table7-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table8-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table9-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > 0 inconsistencies detected.
> >> > Status: OK
> >> > 2014-07-24 19:44:55,194 INFO [main]
> >> > client.HConnectionManager$HConnectionImplementation: Closing master
> >> > protocol: MasterService
> >> > 2014-07-24 19:44:55,194 INFO [main]
> >> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> >> > sessionid=0x2475d15f7b31b73
> >> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
> >> > session: 0x2475d15f7b31b73
> >> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
> >> > client for session: 0x2475d15f7b31b73
> >> > 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >> > zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
> >> > packet:: clientPath:null serverPath:null finished:false header:: 7,-11
> >> > replyHeader:: 7,4295102377,0 request:: null response:: null
> >> > 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
> >> > Disconnecting client for session: 0x2475d15f7b31b73
> >> > 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >> > zookeeper.ClientCnxn: An exception was thrown while closing send
> thread
> >> > for session 0x2475d15f7b31b73 : Unable to read additional data from
> >> > server sessionid 0x2475d15f7b31b73, likely server has closed socket
> >> > 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> >> > 0x2475d15f7b31b73 closed
> >> > 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
> >> > EventThread shut down 10. Fix the assignments as below ./hbase hbck -
> >> > fixAssignments -fixMeta
> >> > Summary:
> >> > table1-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table2-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table3-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table4-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table5-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table6-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table7-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table8-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > table9-0 is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> >> > Number of regions: 1
> >> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >> > 0 inconsistencies detected.
> >> > Status: OK
> >> > 2014-07-24 19:46:16,290 INFO [main]
> >> > client.HConnectionManager$HConnectionImplementation: Closing master
> >> > protocol: MasterService
> >> > 2014-07-24 19:46:16,290 INFO [main]
> >> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> >> > sessionid=0x3475d1605321be9
> >> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
> >> > session: 0x3475d1605321be9
> >> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
> >> > client for session: 0x3475d1605321be9
> >> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >> > zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
> >> > packet:: clientPath:null serverPath:null finished:false header:: 6,-11
> >> > replyHeader:: 6,4295102397,0 request:: null response:: null
> >> > 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
> >> > Disconnecting client for session: 0x3475d1605321be9
> >> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >> > zookeeper.ClientCnxn: An exception was thrown while closing send
> thread
> >> > for session 0x3475d1605321be9 : Unable to read additional data from
> >> > server sessionid 0x3475d1605321be9, likely server has closed socket
> >> > 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> >> > 0x3475d1605321be9 closed
> >> > 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
> >> > EventThread shut down hbase(main):006:0> count 'table4-0'
> >> > 0 row(s) in 0.0200 seconds
> >> > => 0
> >> > hbase(main):007:0>
> >> > Complete data loss happened,
> >> > WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
> >> >
> >> >
> >> >
> >> > [X]
> >> > This e-mail and its attachments contain confidential information from
> >> > HUAWEI, which is intended only for the person or entity whose address
> >> > is listed above. Any use of the information contained herein in any
> way
> >> > (including, but not limited to, total or partial disclosure,
> >> > reproduction, or dissemination) by persons other than the intended
> >> > recipient(s) is prohibited. If you receive this e-mail in error,
> please
> >> > notify the sender by phone or email immediately and delete it!
> >> > [X]
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Ted Yu <yu...@gmail.com>.

In BaseDecoder#rethrowEofException() :

    if (!isEof) throw ioEx;

    LOG.error("Partial cell read caused by EOF: " + ioEx);

    EOFException eofEx = new EOFException("Partial cell read");

    eofEx.initCause(ioEx);

    throw eofEx;

throwing EOFException would not propagate the "Partial cell read" condition
to HLogSplitter which doesn't treat EOFException as an error.

I think a new exception type (DecoderException e.g.) should be used above.

Cheers


On Wed, Jul 30, 2014 at 10:22 AM, Ted Yu <yu...@gmail.com> wrote:

> Looking at HLogSplitter#getNextLogLine() :
>
>     try {
>
>       return in.next();
>
>     } catch (EOFException eof) {
>
>       // truncated files are expected if a RS crashes (see HBASE-2643)
>
>       LOG.info("EOF from hlog " + path + ".  continuing");
>
>       return null;
>
> The EOFException is not treated as an error. But the posted log doesn't
> contain "EOF from hlog " - there may be other code path leading to
> codec.BaseDecoder
>
> Cheers
>
>
> On Wed, Jul 30, 2014 at 9:20 AM, Kiran Kumar.M.R <
> Kiran.Kumar.MR@huawei.com> wrote:
>
>> Hi,
>>
>> After step 4 ( i.e disabling of WAL encryption, removing
>> SecureProtobufReader/Writer and restart), read of encrypted WAL fails
>> mainly due to EOF exception at Basedecoder. This is not considered as error
>> and these WAL are being moved to /oldWALs.
>>
>> Following is observed in log files:
>> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
>> wal.HLogSplitter: Splitting hlog:
>> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017,
>> length=172
>> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
>> wal.HLogSplitter: DistributedLogReplay = false
>> 2014-07-30 19:44:29,313 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
>> util.FSHDFSUtils: Recovering lease on dfs file
>> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
>> 2014-07-30 19:44:29,315 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
>> util.FSHDFSUtils: recoverLease=true, attempt=0 on
>> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
>> after 1ms
>> 2014-07-30 19:44:29,429 DEBUG
>> [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0] wal.HLogSplitter: Writer
>> thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0,5,main]: starting
>> 2014-07-30 19:44:29,429 DEBUG
>> [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1] wal.HLogSplitter: Writer
>> thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1,5,main]: starting
>> 2014-07-30 19:44:29,430 DEBUG
>> [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2] wal.HLogSplitter: Writer
>> thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2,5,main]: starting
>> 2014-07-30 19:44:29,591 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
>> codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
>> Premature EOF from inputStream
>> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
>> wal.HLogSplitter: Finishing writing output logs and closing down.
>> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
>> wal.HLogSplitter: Waiting for split writer threads to finish
>> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
>> wal.HLogSplitter: Split writers finished
>> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
>> wal.HLogSplitter: Processed 0 edits across 0 regions; log
>> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
>> is corrupted = false progress failed = false
>>
>> To fix this, we need to propagate EOF exception to HLogSplitter. Any
>> suggestions on the fix?
>>
>>
>> Regards,
>> Kiran
>>
>> __________________________________________________________________________________________________________
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which is intended only for the person or entity whose address is
>> listed above. Any use of the information contained herein in any way
>> (including, but not limited to, total or partial disclosure, reproduction,
>> or dissemination) by persons other than the intended recipient(s) is
>> prohibited. If you receive this e-mail in error, please notify the sender
>> by phone or email immediately and delete it!
>>
>> __________________________________________________________________________________________________________
>>
>>
>>
>>
>> > -----Original Message-----
>> > From: Shankar hiremath [mailto:shankar.hiremath@huawei.com]
>> > Sent: Friday, July 25, 2014 19:38
>> > To: user@hbase.apache.org
>> > Subject: HBase file encryption, inconsistencies observed and data loss
>> >
>> > HBase file encryption some inconsistencies observed and data loss
>> > happens after running the hbck tool,
>> > the operation steps are as below.    (one thing what I observed is, on
>> > startup of HMaster if it is not able to process the WAL file, then also
>> > it moved to /oldWALs)
>> >
>> > Procedure:
>> > 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
>> > encryption and WAL file encryption as below, and perform 'table4-0' put
>> > operations (100 records added) <property>
>> > <name>hbase.crypto.keyprovider</name>
>> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> > </property>
>> > <property>
>> > <name>hbase.crypto.keyprovider.parameters</name>
>> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
>> > value>
>> > </property>
>> > <property>
>> > <name>hbase.crypto.master.key.name</name>
>> > <value>hdfs</value>
>> > </property>
>> > <property>
>> > <name>hfile.format.version</name>
>> > <value>3</value>
>> > </property>
>> > <property>
>> > <name>hbase.regionserver.hlog.reader.impl</name>
>> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
>> > </value>
>> > </property>
>> > <property>
>> > <name>hbase.regionserver.hlog.writer.impl</name>
>> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
>> > </value>
>> > </property>
>> > <property>
>> > <name>hbase.regionserver.wal.encryption</name>
>> > <value>true</value>
>> > </property>
>> > 3. Machine went down, so all process went down
>> >
>> > 4. We disabled the WAL file encryption for performance reason, and keep
>> > encryption only for Hfile, as below <property>
>> > <name>hbase.crypto.keyprovider</name>
>> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> > </property>
>> > <property>
>> > <name>hbase.crypto.keyprovider.parameters</name>
>> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
>> > value>
>> > </property>
>> > <property>
>> > <name>hbase.crypto.master.key.name</name>
>> > <value>hdfs</value>
>> > </property>
>> > <property>
>> > <name>hfile.format.version</name>
>> > <value>3</value>
>> > </property>
>> > 5. Start the Region Server and query the 'table4-0' data
>> > hbase(main):003:0> count 'table4-0'
>> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online
>> > on XX-XX-XX-XX,60020,1406209023146 at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
>> > me(HRegionServer.java:2685)
>> > at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
>> > ver.java:4119)
>> > at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
>> > ava:3066)
>> > at
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
>> > .callBlockingMethod(ClientProtos.java:29497)
>> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> > at
>> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
>> > heduler.java:168)
>> > at
>> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
>> > duler.java:39)
>> > at
>> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
>> > .java:111)
>> > at java.lang.Thread.run(Thread.java:662)
>> > 6. Not able to read the data, so we decided to revert back the
>> > configuration (as original) 7. Kill/Stop the Region Server, revert all
>> > the configurations as original, as below <property>
>> > <name>hbase.crypto.keyprovider</name>
>> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> > </property>
>> > <property>
>> > <name>hbase.crypto.keyprovider.parameters</name>
>> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
>> > value>
>> > </property>
>> > <property>
>> > <name>hbase.crypto.master.key.name</name>
>> > <value>hdfs</value>
>> > </property>
>> > <property>
>> > <name>hfile.format.version</name>
>> > <value>3</value>
>> > </property>
>> > <property>
>> > <name>hbase.regionserver.hlog.reader.impl</name>
>> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
>> > </value>
>> > </property>
>> > <property>
>> > <name>hbase.regionserver.hlog.writer.impl</name>
>> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
>> > </value>
>> > </property>
>> > <property>
>> > <name>hbase.regionserver.wal.encryption</name>
>> > <value>true</value>
>> > </property>
>> > 7. Start the Region Server, and perform the 'table4-0' query
>> > hbase(main):003:0> count 'table4-0'
>> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online
>> > on XX-XX-XX-XX,60020,1406209023146 at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
>> > me(HRegionServer.java:2685)
>> > at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
>> > ver.java:4119)
>> > at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
>> > ava:3066)
>> > at
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
>> > .callBlockingMethod(ClientProtos.java:29497)
>> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> > at
>> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
>> > heduler.java:168)
>> > at
>> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
>> > duler.java:39)
>> > at
>> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
>> > .java:111)
>> > at java.lang.Thread.run(Thread.java:662)
>> > 8. Run the hbase hbck to repair, as below ./hbase hbck -
>> > details .........................
>> > Summary:
>> > table1-0 is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > table2-0 is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > table3-0 is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > table4-0 is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > table5-0 is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > table6-0 is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > table7-0 is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > table8-0 is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > table9-0 is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > hbase:meta is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > hbase:namespace is okay.
>> > Number of regions: 0
>> > Deployed on:
>> > 22 inconsistencies detected.
>> > Status: INCONSISTENT
>> > 2014-07-24 19:13:05,532 INFO [main]
>> > client.HConnectionManager$HConnectionImplementation: Closing master
>> > protocol: MasterService
>> > 2014-07-24 19:13:05,533 INFO [main]
>> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>> > sessionid=0x1475d1611611bcf
>> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
>> > session: 0x1475d1611611bcf
>> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
>> > client for session: 0x1475d1611611bcf
>> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
>> > packet:: clientPath:null serverPath:null finished:false header:: 6,-11
>> > replyHeader:: 6,4295102074,0 request:: null response:: null
>> > 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
>> > Disconnecting client for session: 0x1475d1611611bcf
>> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > zookeeper.ClientCnxn: An exception was thrown while closing send thread
>> > for session 0x1475d1611611bcf : Unable to read additional data from
>> > server sessionid 0x1475d1611611bcf, likely server has closed socket
>> > 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
>> > EventThread shut down
>> > 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
>> > 0x1475d1611611bcf closed shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
>> > 9. Fix the assignments as below
>> > ./hbase hbck -fixAssignments
>> > Summary:
>> > table1-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table2-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table3-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table4-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table5-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table6-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table7-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table8-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table9-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > 0 inconsistencies detected.
>> > Status: OK
>> > 2014-07-24 19:44:55,194 INFO [main]
>> > client.HConnectionManager$HConnectionImplementation: Closing master
>> > protocol: MasterService
>> > 2014-07-24 19:44:55,194 INFO [main]
>> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>> > sessionid=0x2475d15f7b31b73
>> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
>> > session: 0x2475d15f7b31b73
>> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
>> > client for session: 0x2475d15f7b31b73
>> > 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
>> > packet:: clientPath:null serverPath:null finished:false header:: 7,-11
>> > replyHeader:: 7,4295102377,0 request:: null response:: null
>> > 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
>> > Disconnecting client for session: 0x2475d15f7b31b73
>> > 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > zookeeper.ClientCnxn: An exception was thrown while closing send thread
>> > for session 0x2475d15f7b31b73 : Unable to read additional data from
>> > server sessionid 0x2475d15f7b31b73, likely server has closed socket
>> > 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
>> > 0x2475d15f7b31b73 closed
>> > 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
>> > EventThread shut down 10. Fix the assignments as below ./hbase hbck -
>> > fixAssignments -fixMeta
>> > Summary:
>> > table1-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table2-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table3-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table4-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table5-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table6-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table7-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table8-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > table9-0 is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>> > Number of regions: 1
>> > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > 0 inconsistencies detected.
>> > Status: OK
>> > 2014-07-24 19:46:16,290 INFO [main]
>> > client.HConnectionManager$HConnectionImplementation: Closing master
>> > protocol: MasterService
>> > 2014-07-24 19:46:16,290 INFO [main]
>> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>> > sessionid=0x3475d1605321be9
>> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
>> > session: 0x3475d1605321be9
>> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
>> > client for session: 0x3475d1605321be9
>> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
>> > packet:: clientPath:null serverPath:null finished:false header:: 6,-11
>> > replyHeader:: 6,4295102397,0 request:: null response:: null
>> > 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
>> > Disconnecting client for session: 0x3475d1605321be9
>> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > zookeeper.ClientCnxn: An exception was thrown while closing send thread
>> > for session 0x3475d1605321be9 : Unable to read additional data from
>> > server sessionid 0x3475d1605321be9, likely server has closed socket
>> > 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
>> > 0x3475d1605321be9 closed
>> > 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
>> > EventThread shut down hbase(main):006:0> count 'table4-0'
>> > 0 row(s) in 0.0200 seconds
>> > => 0
>> > hbase(main):007:0>
>> > Complete data loss happened,
>> > WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>> >
>> >
>> >
>> > [X]
>> > This e-mail and its attachments contain confidential information from
>> > HUAWEI, which is intended only for the person or entity whose address
>> > is listed above. Any use of the information contained herein in any way
>> > (including, but not limited to, total or partial disclosure,
>> > reproduction, or dissemination) by persons other than the intended
>> > recipient(s) is prohibited. If you receive this e-mail in error, please
>> > notify the sender by phone or email immediately and delete it!
>> > [X]
>> >
>> >
>> >
>> >
>>
>>
>

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Ted Yu <yu...@gmail.com>.

Looking at HLogSplitter#getNextLogLine() :

    try {

      return in.next();

    } catch (EOFException eof) {

      // truncated files are expected if a RS crashes (see HBASE-2643)

      LOG.info("EOF from hlog " + path + ".  continuing");

      return null;

The EOFException is not treated as an error. But the posted log doesn't
contain "EOF from hlog " - there may be other code path leading to
codec.BaseDecoder

Cheers


On Wed, Jul 30, 2014 at 9:20 AM, Kiran Kumar.M.R <Ki...@huawei.com>
wrote:

> Hi,
>
> After step 4 ( i.e disabling of WAL encryption, removing
> SecureProtobufReader/Writer and restart), read of encrypted WAL fails
> mainly due to EOF exception at Basedecoder. This is not considered as error
> and these WAL are being moved to /oldWALs.
>
> Following is observed in log files:
> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Splitting hlog:
> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017,
> length=172
> 2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: DistributedLogReplay = false
> 2014-07-30 19:44:29,313 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> util.FSHDFSUtils: Recovering lease on dfs file
> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> 2014-07-30 19:44:29,315 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> util.FSHDFSUtils: recoverLease=true, attempt=0 on
> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> after 1ms
> 2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0,5,main]: starting
> 2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1,5,main]: starting
> 2014-07-30 19:44:29,430 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2,5,main]: starting
> 2014-07-30 19:44:29,591 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
> Premature EOF from inputStream
> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Finishing writing output logs and closing down.
> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Waiting for split writer threads to finish
> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Split writers finished
> 2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Processed 0 edits across 0 regions; log
> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> is corrupted = false progress failed = false
>
> To fix this, we need to propagate EOF exception to HLogSplitter. Any
> suggestions on the fix?
>
>
> Regards,
> Kiran
>
> __________________________________________________________________________________________________________
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
> __________________________________________________________________________________________________________
>
>
>
>
> > -----Original Message-----
> > From: Shankar hiremath [mailto:shankar.hiremath@huawei.com]
> > Sent: Friday, July 25, 2014 19:38
> > To: user@hbase.apache.org
> > Subject: HBase file encryption, inconsistencies observed and data loss
> >
> > HBase file encryption some inconsistencies observed and data loss
> > happens after running the hbck tool,
> > the operation steps are as below.    (one thing what I observed is, on
> > startup of HMaster if it is not able to process the WAL file, then also
> > it moved to /oldWALs)
> >
> > Procedure:
> > 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
> > encryption and WAL file encryption as below, and perform 'table4-0' put
> > operations (100 records added) <property>
> > <name>hbase.crypto.keyprovider</name>
> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> > </property>
> > <property>
> > <name>hbase.crypto.keyprovider.parameters</name>
> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
> > value>
> > </property>
> > <property>
> > <name>hbase.crypto.master.key.name</name>
> > <value>hdfs</value>
> > </property>
> > <property>
> > <name>hfile.format.version</name>
> > <value>3</value>
> > </property>
> > <property>
> > <name>hbase.regionserver.hlog.reader.impl</name>
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> > </value>
> > </property>
> > <property>
> > <name>hbase.regionserver.hlog.writer.impl</name>
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> > </value>
> > </property>
> > <property>
> > <name>hbase.regionserver.wal.encryption</name>
> > <value>true</value>
> > </property>
> > 3. Machine went down, so all process went down
> >
> > 4. We disabled the WAL file encryption for performance reason, and keep
> > encryption only for Hfile, as below <property>
> > <name>hbase.crypto.keyprovider</name>
> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> > </property>
> > <property>
> > <name>hbase.crypto.keyprovider.parameters</name>
> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
> > value>
> > </property>
> > <property>
> > <name>hbase.crypto.master.key.name</name>
> > <value>hdfs</value>
> > </property>
> > <property>
> > <name>hfile.format.version</name>
> > <value>3</value>
> > </property>
> > 5. Start the Region Server and query the 'table4-0' data
> > hbase(main):003:0> count 'table4-0'
> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online
> > on XX-XX-XX-XX,60020,1406209023146 at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> > me(HRegionServer.java:2685)
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > ver.java:4119)
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> > ava:3066)
> > at
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> > .callBlockingMethod(ClientProtos.java:29497)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> > at
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> > heduler.java:168)
> > at
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> > duler.java:39)
> > at
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> > .java:111)
> > at java.lang.Thread.run(Thread.java:662)
> > 6. Not able to read the data, so we decided to revert back the
> > configuration (as original) 7. Kill/Stop the Region Server, revert all
> > the configurations as original, as below <property>
> > <name>hbase.crypto.keyprovider</name>
> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> > </property>
> > <property>
> > <name>hbase.crypto.keyprovider.parameters</name>
> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
> > value>
> > </property>
> > <property>
> > <name>hbase.crypto.master.key.name</name>
> > <value>hdfs</value>
> > </property>
> > <property>
> > <name>hfile.format.version</name>
> > <value>3</value>
> > </property>
> > <property>
> > <name>hbase.regionserver.hlog.reader.impl</name>
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> > </value>
> > </property>
> > <property>
> > <name>hbase.regionserver.hlog.writer.impl</name>
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> > </value>
> > </property>
> > <property>
> > <name>hbase.regionserver.wal.encryption</name>
> > <value>true</value>
> > </property>
> > 7. Start the Region Server, and perform the 'table4-0' query
> > hbase(main):003:0> count 'table4-0'
> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online
> > on XX-XX-XX-XX,60020,1406209023146 at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> > me(HRegionServer.java:2685)
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > ver.java:4119)
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> > ava:3066)
> > at
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> > .callBlockingMethod(ClientProtos.java:29497)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> > at
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> > heduler.java:168)
> > at
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> > duler.java:39)
> > at
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> > .java:111)
> > at java.lang.Thread.run(Thread.java:662)
> > 8. Run the hbase hbck to repair, as below ./hbase hbck -
> > details .........................
> > Summary:
> > table1-0 is okay.
> > Number of regions: 0
> > Deployed on:
> > table2-0 is okay.
> > Number of regions: 0
> > Deployed on:
> > table3-0 is okay.
> > Number of regions: 0
> > Deployed on:
> > table4-0 is okay.
> > Number of regions: 0
> > Deployed on:
> > table5-0 is okay.
> > Number of regions: 0
> > Deployed on:
> > table6-0 is okay.
> > Number of regions: 0
> > Deployed on:
> > table7-0 is okay.
> > Number of regions: 0
> > Deployed on:
> > table8-0 is okay.
> > Number of regions: 0
> > Deployed on:
> > table9-0 is okay.
> > Number of regions: 0
> > Deployed on:
> > hbase:meta is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> > Number of regions: 0
> > Deployed on:
> > hbase:namespace is okay.
> > Number of regions: 0
> > Deployed on:
> > 22 inconsistencies detected.
> > Status: INCONSISTENT
> > 2014-07-24 19:13:05,532 INFO [main]
> > client.HConnectionManager$HConnectionImplementation: Closing master
> > protocol: MasterService
> > 2014-07-24 19:13:05,533 INFO [main]
> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> > sessionid=0x1475d1611611bcf
> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
> > session: 0x1475d1611611bcf
> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
> > client for session: 0x1475d1611611bcf
> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
> > packet:: clientPath:null serverPath:null finished:false header:: 6,-11
> > replyHeader:: 6,4295102074,0 request:: null response:: null
> > 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
> > Disconnecting client for session: 0x1475d1611611bcf
> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > zookeeper.ClientCnxn: An exception was thrown while closing send thread
> > for session 0x1475d1611611bcf : Unable to read additional data from
> > server sessionid 0x1475d1611611bcf, likely server has closed socket
> > 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
> > EventThread shut down
> > 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> > 0x1475d1611611bcf closed shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> > 9. Fix the assignments as below
> > ./hbase hbck -fixAssignments
> > Summary:
> > table1-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table2-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table3-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table4-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table5-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table6-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table7-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table8-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table9-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > 0 inconsistencies detected.
> > Status: OK
> > 2014-07-24 19:44:55,194 INFO [main]
> > client.HConnectionManager$HConnectionImplementation: Closing master
> > protocol: MasterService
> > 2014-07-24 19:44:55,194 INFO [main]
> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> > sessionid=0x2475d15f7b31b73
> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
> > session: 0x2475d15f7b31b73
> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
> > client for session: 0x2475d15f7b31b73
> > 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
> > packet:: clientPath:null serverPath:null finished:false header:: 7,-11
> > replyHeader:: 7,4295102377,0 request:: null response:: null
> > 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
> > Disconnecting client for session: 0x2475d15f7b31b73
> > 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > zookeeper.ClientCnxn: An exception was thrown while closing send thread
> > for session 0x2475d15f7b31b73 : Unable to read additional data from
> > server sessionid 0x2475d15f7b31b73, likely server has closed socket
> > 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> > 0x2475d15f7b31b73 closed
> > 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
> > EventThread shut down 10. Fix the assignments as below ./hbase hbck -
> > fixAssignments -fixMeta
> > Summary:
> > table1-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table2-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table3-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table4-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table5-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table6-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table7-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table8-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > table9-0 is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> > Number of regions: 1
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > 0 inconsistencies detected.
> > Status: OK
> > 2014-07-24 19:46:16,290 INFO [main]
> > client.HConnectionManager$HConnectionImplementation: Closing master
> > protocol: MasterService
> > 2014-07-24 19:46:16,290 INFO [main]
> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> > sessionid=0x3475d1605321be9
> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
> > session: 0x3475d1605321be9
> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
> > client for session: 0x3475d1605321be9
> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
> > packet:: clientPath:null serverPath:null finished:false header:: 6,-11
> > replyHeader:: 6,4295102397,0 request:: null response:: null
> > 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
> > Disconnecting client for session: 0x3475d1605321be9
> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > zookeeper.ClientCnxn: An exception was thrown while closing send thread
> > for session 0x3475d1605321be9 : Unable to read additional data from
> > server sessionid 0x3475d1605321be9, likely server has closed socket
> > 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> > 0x3475d1605321be9 closed
> > 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
> > EventThread shut down hbase(main):006:0> count 'table4-0'
> > 0 row(s) in 0.0200 seconds
> > => 0
> > hbase(main):007:0>
> > Complete data loss happened,
> > WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
> >
> >
> >
> > [X]
> > This e-mail and its attachments contain confidential information from
> > HUAWEI, which is intended only for the person or entity whose address
> > is listed above. Any use of the information contained herein in any way
> > (including, but not limited to, total or partial disclosure,
> > reproduction, or dissemination) by persons other than the intended
> > recipient(s) is prohibited. If you receive this e-mail in error, please
> > notify the sender by phone or email immediately and delete it!
> > [X]
> >
> >
> >
> >
>
>

RE: HBase file encryption, inconsistencies observed and data loss

Posted by "Kiran Kumar.M.R" <Ki...@huawei.com>.

Hi,

After step 4 ( i.e disabling of WAL encryption, removing SecureProtobufReader/Writer and restart), read of encrypted WAL fails mainly due to EOF exception at Basedecoder. This is not considered as error and these WAL are being moved to /oldWALs.

Following is observed in log files:
2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Splitting hlog: hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017, length=172
2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: DistributedLogReplay = false
2014-07-30 19:44:29,313 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] util.FSHDFSUtils: Recovering lease on dfs file hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
2014-07-30 19:44:29,315 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] util.FSHDFSUtils: recoverLease=true, attempt=0 on file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017 after 1ms
2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0,5,main]: starting
2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1,5,main]: starting
2014-07-30 19:44:29,430 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2,5,main]: starting
2014-07-30 19:44:29,591 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-1] codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException: Premature EOF from inputStream
2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Finishing writing output logs and closing down.
2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Waiting for split writer threads to finish
2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Split writers finished
2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] wal.HLogSplitter: Processed 0 edits across 0 regions; log file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017 is corrupted = false progress failed = false

To fix this, we need to propagate EOF exception to HLogSplitter. Any suggestions on the fix?


Regards,
Kiran
__________________________________________________________________________________________________________
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
__________________________________________________________________________________________________________




> -----Original Message-----
> From: Shankar hiremath [mailto:shankar.hiremath@huawei.com]
> Sent: Friday, July 25, 2014 19:38
> To: user@hbase.apache.org
> Subject: HBase file encryption, inconsistencies observed and data loss
> 
> HBase file encryption some inconsistencies observed and data loss
> happens after running the hbck tool,
> the operation steps are as below.    (one thing what I observed is, on
> startup of HMaster if it is not able to process the WAL file, then also
> it moved to /oldWALs)
> 
> Procedure:
> 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
> encryption and WAL file encryption as below, and perform 'table4-0' put
> operations (100 records added) <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
> value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 3. Machine went down, so all process went down
> 
> 4. We disabled the WAL file encryption for performance reason, and keep
> encryption only for Hfile, as below <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
> value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> 5. Start the Region Server and query the 'table4-0' data
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online
> on XX-XX-XX-XX,60020,1406209023146 at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> me(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> ava:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> .callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> heduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> duler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> .java:111)
> at java.lang.Thread.run(Thread.java:662)
> 6. Not able to read the data, so we decided to revert back the
> configuration (as original) 7. Kill/Stop the Region Server, revert all
> the configurations as original, as below <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
> value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 7. Start the Region Server, and perform the 'table4-0' query
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online
> on XX-XX-XX-XX,60020,1406209023146 at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> me(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> ava:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> .callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> heduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> duler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> .java:111)
> at java.lang.Thread.run(Thread.java:662)
> 8. Run the hbase hbck to repair, as below ./hbase hbck -
> details .........................
> Summary:
> table1-0 is okay.
> Number of regions: 0
> Deployed on:
> table2-0 is okay.
> Number of regions: 0
> Deployed on:
> table3-0 is okay.
> Number of regions: 0
> Deployed on:
> table4-0 is okay.
> Number of regions: 0
> Deployed on:
> table5-0 is okay.
> Number of regions: 0
> Deployed on:
> table6-0 is okay.
> Number of regions: 0
> Deployed on:
> table7-0 is okay.
> Number of regions: 0
> Deployed on:
> table8-0 is okay.
> Number of regions: 0
> Deployed on:
> table9-0 is okay.
> Number of regions: 0
> Deployed on:
> hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> Number of regions: 0
> Deployed on:
> hbase:namespace is okay.
> Number of regions: 0
> Deployed on:
> 22 inconsistencies detected.
> Status: INCONSISTENT
> 2014-07-24 19:13:05,532 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:13:05,533 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
> session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
> client for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
> packet:: clientPath:null serverPath:null finished:false header:: 6,-11
> replyHeader:: 6,4295102074,0 request:: null response:: null
> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
> Disconnecting client for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread
> for session 0x1475d1611611bcf : Unable to read additional data from
> server sessionid 0x1475d1611611bcf, likely server has closed socket
> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> 0x1475d1611611bcf closed shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> 9. Fix the assignments as below
> ./hbase hbck -fixAssignments
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
> session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
> client for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
> packet:: clientPath:null serverPath:null finished:false header:: 7,-11
> replyHeader:: 7,4295102377,0 request:: null response:: null
> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
> Disconnecting client for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread
> for session 0x2475d15f7b31b73 : Unable to read additional data from
> server sessionid 0x2475d15f7b31b73, likely server has closed socket
> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> 0x2475d15f7b31b73 closed
> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down 10. Fix the assignments as below ./hbase hbck -
> fixAssignments -fixMeta
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
> session: 0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
> client for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
> packet:: clientPath:null serverPath:null finished:false header:: 6,-11
> replyHeader:: 6,4295102397,0 request:: null response:: null
> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
> Disconnecting client for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread
> for session 0x3475d1605321be9 : Unable to read additional data from
> server sessionid 0x3475d1605321be9, likely server has closed socket
> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> 0x3475d1605321be9 closed
> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down hbase(main):006:0> count 'table4-0'
> 0 row(s) in 0.0200 seconds
> => 0
> hbase(main):007:0>
> Complete data loss happened,
> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
> 
> 
> 
> [X]
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address
> is listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure,
> reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by phone or email immediately and delete it!
> [X]
> 
> 
> 
>

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Andrew Purtell <an...@gmail.com>.

Got it. 

The user should not change the WAL reader class specified in site configuration once there may be encrypted WALs written. However the writer could be changed or configured not to encrypt. We could update the security section of the manual to clarify. 


> On Jul 28, 2014, at 7:06 PM, Anoop John <an...@gmail.com> wrote:
> 
> Yes in btw the restart the config was changed. In steps the #4 was that.
> Wal encryption config is changed to false. Well that is ok but the reader
> can not be changed. Because we dont find reader by looking at wal file meta
> that whether this file is encrypted or not. Wal reading was this way with
> user has to configure correct reader. So not sure whether any code change
> needed or not.  Once the wal encryption was done, even after changing it
> back to off the reader should continue to be SecureProtobufLogReader. (At
> least till all existing wals are replayed)
> 
> And files moved to old logs but not corrupt folder is something tobe
> checked. Any chance for a look there and patch Shankar?
> 
> Anoop
> 
> 
> Anoop
> 
> 
> 
> 
>> On Sunday, July 27, 2014, Andrew Purtell <an...@gmail.com> wrote:
>> So the regionserver configuration was changed after it crashed but before
> it was restarted ?
>> 
>> The impression given by the initial report is that simply using encrypted
> WALs will cause data loss. That's not the case as I have confirmed. There
> could be an edge case somewhere but the original reporter has left out
> important detail about how to reproduce the problem. The below is not
> written in clear language either, so I'm not following along. I'd be happy
> to help look at this more once clear steps for reproducing the problem are
> available. Otherwise since you're talking with Shankar somehow offline
> already I'll leave you to it Anoop.
>> 
>>> Also when the file can not be read, this is not moved under corrupt logs
> is a concerning thing.  Need to look at that.
>> 
>> Agreed.
>> 
>> 
>>> On Jul 27, 2014, at 1:07 AM, Anoop John <an...@gmail.com> wrote:
>>> 
>>> As per Shankar he can get things work with below configs
>>> 
>>> <property>
>>>       <name>hbase.regionserver.hlog.reader.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>>> </property>
>>> <property>
>>>       <name>hbase.regionserver.hlog.writer.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>>> </property>
>>> <property>
>>>       <name>hbase.regionserver.wal.encryption</name>
>>>       <value>false</value>
>>> </property>
>>> 
>>> Once the RS crash happened,  the config is maintained above way. See that
>>> WAL encryption is disabled now.  Still note that the reader is
>>> SecureProtobufLogReader. The existing WAL files are with encryption and
>>> only SecureProtobufLogReader can read them.  So if that is not
> configured,
>>> the default reader is. ProtobufLogReader  can not read them back
>>> correctly.    So this is the issue that Shankar faced.
>>> 
>>> Also when the file can not be read, this is not moved under corrupt logs
> is
>>> a concerning thing.  Need to look at that.
>>> 
>>> -Anoop-
>>> 
>>> On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <
> andrew.purtell@gmail.com>
>>> wrote:
>>> 
>>>> My attempt to reproduce this issue:
>>>> 
>>>> 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a
> dev
>>>> box.
>>>> 
>>>> 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver
> also on
>>>> this dev box.
>>>> 
>>>> 3. Set dfs.replication and
> hbase.regionserver.hlog.tolerable.lowreplication
>>>> to 1. Set up a keystore and enabled WAL encryption.
>>>> 
>>>> 4. Created a test table.
>>>> 
>>>> 5. Used YCSB to write 1000 rows to the test table. No flushes observed.
>>>> 
>>>> 6. Used the shell to count the number of records in the test table.
> Count =
>>>> 1000 rows
>>>> 
>>>> 7. kill -9 the regionserver process.
>>>> 
>>>> 8. Started a new regionserver process. Observed log splitting and
> replay in
>>>> the regionserver log, no errors.
>>>> 
>>>> 9. Used the shell to count the number of records in the test table.
> Count =
>>>> 1000 rows
>>>> 
>>>> Tried this a few times.
>>>> 
>>>> Shankar, can you try running through the above and let us know if the
>>>> outcome is different?
>>>> 
>>>> 
>>>> 
>>>> On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <
> andrew.purtell@gmail.com>
>>>> wrote:
>>>> 
>>>>> Thanks for the detail. So to summarize:
>>>>> 
>>>>> 0. HBase 0.98.3 and HDFS 2.4.1
>>>>> 
>>>>> 1. All data before failure has not yet been flushed so only exists in
> the
>>>>> WAL files.
>>>>> 
>>>>> 2. During distributed splitting, the WAL has either not been written
> out
>>>>> or is unreadable:
>>>>> 
>>>>> 
>>>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>>> codec.BaseDecoder: Partial cell read caused by EOF:
> java.io.IOException:
>>>>> Premature EOF from inputStream
>>>>> 
>>>>> 
>>>>> 3. This file is still moved to oldWALs even though splitting failed.
>>>>> 
>>>>> 4. Setting 'hbase.regionserver.wal.encryption' to false allows for data
>>>>> recovery in your scenario.
>>>>> 
>>>>> See https://issues.apache.org/jira/browse/HBASE-11595
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
>>>> shankar.hiremath@huawei.com>
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> Hi Andrew,
>>>>> 
>>>>> 
>>>>> Please find the details
>>>>> 
>>>>> 
>>>>> Hbase 0.98.3 & hadoop 2.4.1
>>>>> 
>>>>> Hbase root file system on hdfs
>>>>> 
>>>>> 
>>>>> On Hmaster side there is no failure or error message in the log file
>>>>> 
>>>>> On Region Server side the the below error message reported as below
>>>>> 
>>>>> 
>>>>> Region Server Log:
>>>>> 
>>>>> 2014-07-26 19:29:15,904 DEBUG
> [regionserver60020-SendThread(host2:2181)]
>>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c,
> packet::
>>>>> clientPath:null serverPath:null finished:false header:: 172,4
>>>>> replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F
>>>>> response::
> #ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
>>>>> 
>>>>> 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
>>>>> 
>>>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
>>>>> wal.HLogSplitter: Writer thread
>>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
>>>>> 
>>>>> 
>>>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>>> codec.BaseDecoder: Partial cell read caused by EOF:
> java.io.IOException:
>>>>> Premature EOF from inputStream
>>>>> 
>>>>> 
>>>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>>> wal.HLogSplitter: Finishing writing output logs and closing down.
>>>>> 
>>>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>>> wal.HLogSplitter: Waiting for split writer threads to finish
>>>>> 
>>>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>>> wal.HLogSplitter: Split writers finished
>>>>> 
>>>>> 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>>> wal.HLogSplitter: Processed 0 edits across 0 regions; log
> file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta
>>>>> is corrupted = false progress failed = false
>>>>> 
>>>>> 2014-07-26 19:29:16,184 DEBUG
> [regionserver60020-SendThread(host2:2181)]
>>>>> zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
>>>>> 
>>>>> 
>>>>> 
>>>>> When I query the table data, which was in WAL files(before the
>>>>> RegionServer machine went down) is not coming,
>>>>> 
>>>>> One more thing what I observed is even when the WAL file not
> successfully
>>>>> processed then also it is moving to /oldWALs folder.
>>>>> 
>>>>> So when I revert back the below 3 configuration in Region Server side
> and
>>>>> restart, since the WAL is already moved to oldWALS/ folder,
>>>>> 
>>>>> So it will not get processed.
>>>>> 
>>>>> 
>>>>> <property>
>>>>> 
>>>>>  <name>hbase.regionserver.hlog.reader.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.hlog.writer.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.wal.encryption</name>
>>>>> 
>>>>> <value>true</value>
>>>>> 
>>>>> </property>
> -------------------------------------------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> And one more scenario I tried (Anoop suggested), with the below
>>>>> configuration (instead of deleting the below 3 config paramters
>>>>> 
>>>>> Kepp all but make only 'hbase.regionserver.wal.encryption=false') the
>>>>> encrypted wal file is getting processed
>>>>> 
>>>>> Successfully, and the query table is giving the WAL data (before the
>>>>> RegionServer machine went down) correctly.
>>>>> 
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.hlog.reader.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.hlog.writer.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.wal.encryption</name>
>>>>> 
>>>>> <value>false</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> 
>>>>> 
>>>>> Regards
>>>>> 
>>>>> -Shankar
>>>>> 
>>>>> 
>>>>> This e-mail and its attachments contain confidential information from
>>>>> HUAWEI, which is intended only for the person or entity whose address
> is
>>>>> listed above. Any use of the information contained herein in any way
>>>>> (including, but not limited to, total or partial disclosure,
>>>> reproduction,
>>>>> or dissemination) by persons other than the intended recipient(s) is
>>>>> prohibited. If you receive this e-mail in error, please notify the
> sender
>>>>> by phone or email immediately and delete it!
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> 
>>>>> From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
>>>>> <an...@gmail.com>] On Behalf Of Andrew Purtell
>>>>> 
>>>>> Sent: 26 July 2014 AM 02:21
>>>>> 
>>>>> To: user@hbase.apache.org
>>>>> 
>>>>> Subject: Re: HBase file encryption, inconsistencies observed and data
>>>> loss
>>>>> 
>>>>> 
>>>>> Encryption (or the lack of it) doesn't explain missing HFiles.
>>>>> 
>>>>> 
>>>>> Most likely if you are having a problem with encryption, this will
>>>>> manifest as follows: HFiles will be present. However, you will find
> many
>>>>> IOExceptions in the regionserver logs as they attempt to open the
> HFiles
>>>>> but fail because the data is unreadable.
>>>>> 
>>>>> 
>>>>> We should start by looking at more basic issues. What could explain the
>>>>> total disappearance of HFiles.
>>>>> 
>>>>> 
>>>>> Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on
>>>>> the local filesystem (fs URL starts with file://)?
>>>>> 
>>>>> 
>>>>> In your email you provide only exceptions printed by the client. What
>>>> kind
>>>>> of exceptions appear in the regionserver logs? Or appear in the master
>>>> log?
>>>>> 
>>>>> If the logs are large your best bet is to pastebin them and then send
> the
>>>>> URL to the paste in your response.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
>>>>> shankar.hiremath@huawei.com> wrote:
>>>>> 
>>>>> 
>>>>> HBase file encryption some inconsistencies observed and data loss
>>>>> 
>>>>> happens after running the hbck tool,
>>>>> 
>>>>> the operation steps are as below.    (one thing what I observed is, on
>>>>> 
>>>>> startup of HMaster if it is not able to process the WAL file, then
>>>>> 
>>>>> also it moved to /oldWALs)
>>>>> 
>>>>> 
>>>>> Procedure:
>>>>> 
>>>>> 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
>>>>> 
>>>>> encryption and WAL file encryption as below, and perform 'table4-0'
>>>>> 
>>>>> put operations (100 records added) <property>
>>>>> 
>>>>> <name>hbase.crypto.keyprovider</name>
>>>>> 
>>>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.crypto.keyprovider.parameters</name>
>>>>> 
>>>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>>>> 
>>>>> </value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.crypto.master.key.name</name>
>>>>> 
>>>>> <value>hdfs</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hfile.format.version</name>
>>>>> 
>>>>> <value>3</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.hlog.reader.impl</name>
>>>>> 
>>>>> 
>>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>>>>> 
>>>>> r</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.hlog.writer.impl</name>
>>>>> 
>>>>> 
>>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>>>>> 
>>>>> r</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.wal.encryption</name>
>>>>> 
>>>>> <value>true</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> 3. Machine went down, so all process went down
>>>>> 
>>>>> 
>>>>> 4. We disabled the WAL file encryption for performance reason, and
>>>>> 
>>>>> keep encryption only for Hfile, as below <property>
>>>>> 
>>>>> <name>hbase.crypto.keyprovider</name>
>>>>> 
>>>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.crypto.keyprovider.parameters</name>
>>>>> 
>>>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>>>> 
>>>>> </value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.crypto.master.key.name</name>
>>>>> 
>>>>> <value>hdfs</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hfile.format.version</name>
>>>>> 
>>>>> <value>3</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> 5. Start the Region Server and query the 'table4-0' data
>>>>> 
>>>>> hbase(main):003:0> count 'table4-0'
>>>>> 
>>>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>>>>> 
>>>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>>>>> 
>>>>> online on
>>>>> 
>>>>> XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>>>>> 
>>>>> ame(HRegionServer.java:2685)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>>>>> 
>>>>> rver.java:4119)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>>>>> 
>>>>> java:3066)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>>>>> 
>>>>> 2.callBlockingMethod(ClientProtos.java:29497)
>>>>> 
>>>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>>>>> 
>>>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>>>>> 
>>>>> cheduler.java:168)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>>>>> 
>>>>> eduler.java:39)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>>>>> 
>>>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>>>>> 
>>>>> 6. Not able to read the data, so we decided to revert back the
>>>>> 
>>>>> configuration (as original) 7. Kill/Stop the Region Server, revert all
>>>>> 
>>>>> the configurations as original, as below <property>
>>>>> 
>>>>> <name>hbase.crypto.keyprovider</name>
>>>>> 
>>>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.crypto.keyprovider.parameters</name>
>>>>> 
>>>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>>>> 
>>>>> </value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.crypto.master.key.name</name>
>>>>> 
>>>>> <value>hdfs</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hfile.format.version</name>
>>>>> 
>>>>> <value>3</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.hlog.reader.impl</name>
>>>>> 
>>>>> 
>>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>>>>> 
>>>>> r</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.hlog.writer.impl</name>
>>>>> 
>>>>> 
>>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>>>>> 
>>>>> r</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> 
>>>>> <name>hbase.regionserver.wal.encryption</name>
>>>>> 
>>>>> <value>true</value>
>>>>> 
>>>>> </property>
>>>>> 
>>>>> 7. Start the Region Server, and perform the 'table4-0' query
>>>>> 
>>>>> hbase(main):003:0> count 'table4-0'
>>>>> 
>>>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>>>>> 
>>>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>>>>> 
>>>>> online on
>>>>> 
>>>>> XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>>>>> 
>>>>> ame(HRegionServer.java:2685)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>>>>> 
>>>>> rver.java:4119)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>>>>> 
>>>>> java:3066)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>>>>> 
>>>>> 2.callBlockingMethod(ClientProtos.java:29497)
>>>>> 
>>>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>>>>> 
>>>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>>>>> 
>>>>> cheduler.java:168)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>>>>> 
>>>>> eduler.java:39)
>>>>> 
>>>>> at
>>>>> 
>>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>>>>> 
>>>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>>>>> 
>>>>> 8. Run the hbase hbck to repair, as below ./hbase hbck -details
>>>>> 
>>>>> .........................
>>>>> 
>>>>> Summary:
>>>>> 
>>>>> table1-0 is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> table2-0 is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> table3-0 is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> table4-0 is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> table5-0 is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> table6-0 is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> table7-0 is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> table8-0 is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> table9-0 is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> hbase:meta is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> hbase:namespace is okay.
>>>>> 
>>>>> Number of regions: 0
>>>>> 
>>>>> Deployed on:
>>>>> 
>>>>> 22 inconsistencies detected.
>>>>> 
>>>>> Status: INCONSISTENT
>>>>> 
>>>>> 2014-07-24 19:13:05,532 INFO [main]
>>>>> 
>>>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>>>> 
>>>>> protocol: MasterService
>>>>> 
>>>>> 2014-07-24 19:13:05,533 INFO [main]
>>>>> 
>>>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>>>> 
>>>>> sessionid=0x1475d1611611bcf
>>>>> 
>>>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
>>>> session:
>>>>> 
>>>>> 0x1475d1611611bcf
>>>>> 
>>>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
>>>>> 
>>>>> client for session: 0x1475d1611611bcf
>>>>> 
>>>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>> 
>>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
> packet::
>>>>> 
>>>>> clientPath:null serverPath:null finished:false header:: 6,-11
>>>> replyHeader::
>>>>> 
>>>>> 6,4295102074,0 request:: null response:: null
>>>>> 
>>>>> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
>>>>> 
>>>>> Disconnecting client for session: 0x1475d1611611bcf
>>>>> 
>>>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>> 
>>>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>>>> 
>>>>> thread for session 0x1475d1611611bcf : Unable to read additional data
>>>>> 
>>>>> from server sessionid 0x1475d1611611bcf, likely server has closed
>>>>> 
>>>>> socket
>>>>> 
>>>>> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>>> 
>>>>> EventThread shut down
>>>>> 
>>>>> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
>>>>> 
>>>>> 0x1475d1611611bcf closed
>>>>> 
>>>>> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
>>>>> 
>>>>> 9. Fix the assignments as below
>>>>> 
>>>>> ./hbase hbck -fixAssignments
>>>>> 
>>>>> Summary:
>>>>> 
>>>>> table1-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table2-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table3-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table4-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table5-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table6-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table7-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table8-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table9-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> 0 inconsistencies detected.
>>>>> 
>>>>> Status: OK
>>>>> 
>>>>> 2014-07-24 19:44:55,194 INFO [main]
>>>>> 
>>>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>>>> 
>>>>> protocol: MasterService
>>>>> 
>>>>> 2014-07-24 19:44:55,194 INFO [main]
>>>>> 
>>>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>>>> 
>>>>> sessionid=0x2475d15f7b31b73
>>>>> 
>>>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
>>>> session:
>>>>> 
>>>>> 0x2475d15f7b31b73
>>>>> 
>>>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
>>>>> 
>>>>> client for session: 0x2475d15f7b31b73
>>>>> 
>>>>> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>> 
>>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
> packet::
>>>>> 
>>>>> clientPath:null serverPath:null finished:false header:: 7,-11
>>>> replyHeader::
>>>>> 
>>>>> 7,4295102377,0 request:: null response:: null
>>>>> 
>>>>> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
>>>>> 
>>>>> Disconnecting client for session: 0x2475d15f7b31b73
>>>>> 
>>>>> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>> 
>>>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>>>> 
>>>>> thread for session 0x2475d15f7b31b73 : Unable to read additional data
>>>>> 
>>>>> from server sessionid 0x2475d15f7b31b73, likely server has closed
>>>>> 
>>>>> socket
>>>>> 
>>>>> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
>>>>> 
>>>>> 0x2475d15f7b31b73 closed
>>>>> 
>>>>> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>>> 
>>>>> EventThread shut down
>>>>> 
>>>>> 10. Fix the assignments as below
>>>>> 
>>>>> ./hbase hbck -fixAssignments -fixMeta
>>>>> 
>>>>> Summary:
>>>>> 
>>>>> table1-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table2-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table3-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table4-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table5-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table6-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table7-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table8-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> table9-0 is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>>>>> 
>>>>> Number of regions: 1
>>>>> 
>>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>> 
>>>>> 0 inconsistencies detected.
>>>>> 
>>>>> Status: OK
>>>>> 
>>>>> 2014-07-24 19:46:16,290 INFO [main]
>>>>> 
>>>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>>>> 
>>>>> protocol: MasterService
>>>>> 
>>>>> 2014-07-24 19:46:16,290 INFO [main]
>>>>> 
>>>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>>>> 
>>>>> sessionid=0x3475d1605321be9
>>>>> 
>>>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
>>>> session:
>>>>> 
>>>>> 0x3475d1605321be9
>>>>> 
>>>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
>>>>> 
>>>>> client for session: 0x3475d1605321be9
>>>>> 
>>>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>> 
>>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
> packet::
>>>>> 
>>>>> clientPath:null serverPath:null finished:false header:: 6,-11
>>>> replyHeader::
>>>>> 
>>>>> 6,4295102397,0 request:: null response:: null
>>>>> 
>>>>> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
>>>>> 
>>>>> Disconnecting client for session: 0x3475d1605321be9
>>>>> 
>>>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>> 
>>>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>>>> 
>>>>> thread for session 0x3475d1605321be9 : Unable to read additional data
>>>>> 
>>>>> from server sessionid 0x3475d1605321be9, likely server has closed
>>>>> 
>>>>> socket
>>>>> 
>>>>> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
>>>>> 
>>>>> 0x3475d1605321be9 closed
>>>>> 
>>>>> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>>> 
>>>>> EventThread shut down
>>>>> 
>>>>> hbase(main):006:0> count 'table4-0'
>>>>> 
>>>>> 0 row(s) in 0.0200 seconds
>>>>> 
>>>>> => 0
>>>>> 
>>>>> hbase(main):007:0>
>>>>> 
>>>>> Complete data loss happened,
>>>>> 
>>>>> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> [X]
>>>>> 
>>>>> This e-mail and its attachments contain confidential information from
>>>>> 
>>>>> HUAWEI, which is intended only for the person or entity whose address
>>>>> 
>>>>> is listed above. Any use of the information contained herein in any
>>>>> 
>>>>> way (including, but not limited to, total or partial disclosure,
>>>>> 
>>>>> reproduction, or dissemination) by persons other than the intended
>>>>> 
>>>>> recipient(s) is prohibited. If you receive this e-mail in error,
>>>>> 
>>>>> please notify the sender by phone or email immediately and delete it!
>>>>> 
>>>>> [X]
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> 
>>>>> - Andy
>>>>> 
>>>>> 
>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
>>>>> (via Tom White)
>>

RE: HBase file encryption, inconsistencies observed and data loss

Posted by "Kiran Kumar.M.R" <Ki...@huawei.com>.



This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!




> -----Original Message-----
> From: Anoop John [mailto:anoop.hbase@gmail.com]
> Sent: Tuesday, July 29, 2014 07:36
> To: user@hbase.apache.org
> Subject: Re: HBase file encryption, inconsistencies observed and data
> loss
> 
> Yes in btw the restart the config was changed. In steps the #4 was that.
>  Wal encryption config is changed to false. Well that is ok but the
> reader can not be changed. Because we dont find reader by looking at
> wal file meta that whether this file is encrypted or not. Wal reading
> was this way with user has to configure correct reader. So not sure
> whether any code change needed or not.  Once the wal encryption was
> done, even after changing it back to off the reader should continue to
> be SecureProtobufLogReader. (At least till all existing wals are
> replayed)
> 
> And files moved to old logs but not corrupt folder is something tobe
> checked. Any chance for a look there and patch Shankar?

[Kiran]  Anoop, we are checking this issue. Will submit a patch if needed.
> 
> Anoop
> 
> 
> Anoop
> 
> 
> 
> 
> On Sunday, July 27, 2014, Andrew Purtell <an...@gmail.com>
> wrote:
> > So the regionserver configuration was changed after it crashed but
> > before
> it was restarted ?
> >
> > The impression given by the initial report is that simply using
> > encrypted
> WALs will cause data loss. That's not the case as I have confirmed.
> There could be an edge case somewhere but the original reporter has
> left out important detail about how to reproduce the problem. The below
> is not written in clear language either, so I'm not following along.
> I'd be happy to help look at this more once clear steps for reproducing
> the problem are available. Otherwise since you're talking with Shankar
> somehow offline already I'll leave you to it Anoop.
> >
> >> Also when the file can not be read, this is not moved under corrupt
> >> logs
> is a concerning thing.  Need to look at that.
> >
> > Agreed.
> >
> >
> >> On Jul 27, 2014, at 1:07 AM, Anoop John <an...@gmail.com>
> wrote:
> >>
> >> As per Shankar he can get things work with below configs
> >>
> >> <property>
> >>        <name>hbase.regionserver.hlog.reader.impl</name>
> >>
> >>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> >> </property>
> >> <property>
> >>        <name>hbase.regionserver.hlog.writer.impl</name>
> >>
> >>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> >> </property>
> >> <property>
> >>        <name>hbase.regionserver.wal.encryption</name>
> >>        <value>false</value>
> >> </property>
> >>
> >> Once the RS crash happened,  the config is maintained above way. See
> >> that WAL encryption is disabled now.  Still note that the reader is
> >> SecureProtobufLogReader. The existing WAL files are with encryption
> >> and only SecureProtobufLogReader can read them.  So if that is not
> configured,
> >> the default reader is. ProtobufLogReader  can not read them back
> >> correctly.    So this is the issue that Shankar faced.
> >>
> >> Also when the file can not be read, this is not moved under corrupt
> >> logs
> is
> >> a concerning thing.  Need to look at that.
> >>
> >> -Anoop-
> >>
> >> On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <
> andrew.purtell@gmail.com>
> >> wrote:
> >>
> >>> My attempt to reproduce this issue:
> >>>
> >>> 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on
> >>> a
> dev
> >>> box.
> >>>
> >>> 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver
> also on
> >>> this dev box.
> >>>
> >>> 3. Set dfs.replication and
> hbase.regionserver.hlog.tolerable.lowreplication
> >>> to 1. Set up a keystore and enabled WAL encryption.
> >>>
> >>> 4. Created a test table.
> >>>
> >>> 5. Used YCSB to write 1000 rows to the test table. No flushes
> observed.
> >>>
> >>> 6. Used the shell to count the number of records in the test table.
> Count =
> >>> 1000 rows
> >>>
> >>> 7. kill -9 the regionserver process.
> >>>
> >>> 8. Started a new regionserver process. Observed log splitting and
> replay in
> >>> the regionserver log, no errors.
> >>>
> >>> 9. Used the shell to count the number of records in the test table.
> Count =
> >>> 1000 rows
> >>>
> >>> Tried this a few times.
> >>>
> >>> Shankar, can you try running through the above and let us know if
> >>> the outcome is different?
> >>>
> >>>
> >>>
> >>> On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <
> andrew.purtell@gmail.com>
> >>> wrote:
> >>>
> >>>> Thanks for the detail. So to summarize:
> >>>>
> >>>> 0. HBase 0.98.3 and HDFS 2.4.1
> >>>>
> >>>> 1. All data before failure has not yet been flushed so only exists
> >>>> in
> the
> >>>> WAL files.
> >>>>
> >>>> 2. During distributed splitting, the WAL has either not been
> >>>> written
> out
> >>>> or is unreadable:
> >>>>
> >>>>
> >>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> codec.BaseDecoder: Partial cell read caused by EOF:
> java.io.IOException:
> >>>> Premature EOF from inputStream
> >>>>
> >>>>
> >>>> 3. This file is still moved to oldWALs even though splitting
> failed.
> >>>>
> >>>> 4. Setting 'hbase.regionserver.wal.encryption' to false allows for
> >>>> data recovery in your scenario.
> >>>>
> >>>> See https://issues.apache.org/jira/browse/HBASE-11595
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
> >>> shankar.hiremath@huawei.com>
> >>>> wrote:
> >>>>
> >>>>
> >>>> Hi Andrew,
> >>>>
> >>>>
> >>>> Please find the details
> >>>>
> >>>>
> >>>> Hbase 0.98.3 & hadoop 2.4.1
> >>>>
> >>>> Hbase root file system on hdfs
> >>>>
> >>>>
> >>>> On Hmaster side there is no failure or error message in the log
> >>>> file
> >>>>
> >>>> On Region Server side the the below error message reported as
> below
> >>>>
> >>>>
> >>>> Region Server Log:
> >>>>
> >>>> 2014-07-26 19:29:15,904 DEBUG
> [regionserver60020-SendThread(host2:2181)]
> >>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c,
> packet::
> >>>> clientPath:null serverPath:null finished:false header:: 172,4
> >>>> replyHeader:: 172,4294988825,0  request::
> '/hbase/table/hbase:acl,F
> >>>> response::
> >>>
> #ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc1
> 5042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,
> 31,0,4294967476}
> >>>>
> >>>> 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
> >>>>
> >>>>
> >>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> codec.BaseDecoder: Partial cell read caused by EOF:
> java.io.IOException:
> >>>> Premature EOF from inputStream
> >>>>
> >>>>
> >>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> wal.HLogSplitter: Finishing writing output logs and closing down.
> >>>>
> >>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> wal.HLogSplitter: Waiting for split writer threads to finish
> >>>>
> >>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> wal.HLogSplitter: Split writers finished
> >>>>
> >>>> 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> wal.HLogSplitter: Processed 0 edits across 0 regions; log
> >>>
> file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-
> splitting/host1%2C60020%2C1406383007151.1406383069334.meta
> >>>> is corrupted = false progress failed = false
> >>>>
> >>>> 2014-07-26 19:29:16,184 DEBUG
> [regionserver60020-SendThread(host2:2181)]
> >>>> zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
> >>>>
> >>>>
> >>>>
> >>>> When I query the table data, which was in WAL files(before the
> >>>> RegionServer machine went down) is not coming,
> >>>>
> >>>> One more thing what I observed is even when the WAL file not
> successfully
> >>>> processed then also it is moving to /oldWALs folder.
> >>>>
> >>>> So when I revert back the below 3 configuration in Region Server
> >>>> side
> and
> >>>> restart, since the WAL is already moved to oldWALS/ folder,
> >>>>
> >>>> So it will not get processed.
> >>>>
> >>>>
> >>>> <property>
> >>>>
> >>>>   <name>hbase.regionserver.hlog.reader.impl</name>
> >>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.hlog.writer.impl</name>
> >>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.wal.encryption</name>
> >>>>
> >>>>  <value>true</value>
> >>>>
> >>>> </property>
> >>>
> -----------------------------------------------------------------------
> --------------------------------------
> >>>>
> >>>>
> >>>> And one more scenario I tried (Anoop suggested), with the below
> >>>> configuration (instead of deleting the below 3 config paramters
> >>>>
> >>>> Kepp all but make only 'hbase.regionserver.wal.encryption=false')
> >>>> the encrypted wal file is getting processed
> >>>>
> >>>> Successfully, and the query table is giving the WAL data (before
> >>>> the RegionServer machine went down) correctly.
> >>>>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.hlog.reader.impl</name>
> >>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.hlog.writer.impl</name>
> >>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.wal.encryption</name>
> >>>>
> >>>>  <value>false</value>
> >>>>
> >>>> </property>
> >>>>
> >>>>
> >>>>
> >>>> Regards
> >>>>
> >>>> -Shankar
> >>>>
> >>>>
> >>>> This e-mail and its attachments contain confidential information
> >>>> from HUAWEI, which is intended only for the person or entity whose
> >>>> address
> is
> >>>> listed above. Any use of the information contained herein in any
> >>>> way (including, but not limited to, total or partial disclosure,
> >>> reproduction,
> >>>> or dissemination) by persons other than the intended recipient(s)
> >>>> is prohibited. If you receive this e-mail in error, please notify
> >>>> the
> sender
> >>>> by phone or email immediately and delete it!
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>>
> >>>> From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
> >>>> <an...@gmail.com>] On Behalf Of Andrew Purtell
> >>>>
> >>>> Sent: 26 July 2014 AM 02:21
> >>>>
> >>>> To: user@hbase.apache.org
> >>>>
> >>>> Subject: Re: HBase file encryption, inconsistencies observed and
> >>>> data
> >>> loss
> >>>>
> >>>>
> >>>> Encryption (or the lack of it) doesn't explain missing HFiles.
> >>>>
> >>>>
> >>>> Most likely if you are having a problem with encryption, this will
> >>>> manifest as follows: HFiles will be present. However, you will
> find
> many
> >>>> IOExceptions in the regionserver logs as they attempt to open the
> HFiles
> >>>> but fail because the data is unreadable.
> >>>>
> >>>>
> >>>> We should start by looking at more basic issues. What could
> explain
> >>>> the total disappearance of HFiles.
> >>>>
> >>>>
> >>>> Is the HBase root filesystem on HDFS (fs URL starts with hdfs://)
> >>>> or on the local filesystem (fs URL starts with file://)?
> >>>>
> >>>>
> >>>> In your email you provide only exceptions printed by the client.
> >>>> What
> >>> kind
> >>>> of exceptions appear in the regionserver logs? Or appear in the
> >>>> master
> >>> log?
> >>>>
> >>>> If the logs are large your best bet is to pastebin them and then
> >>>> send
> the
> >>>> URL to the paste in your response.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
> >>>> shankar.hiremath@huawei.com> wrote:
> >>>>
> >>>>
> >>>> HBase file encryption some inconsistencies observed and data loss
> >>>>
> >>>> happens after running the hbck tool,
> >>>>
> >>>> the operation steps are as below.    (one thing what I observed is,
> on
> >>>>
> >>>> startup of HMaster if it is not able to process the WAL file, then
> >>>>
> >>>> also it moved to /oldWALs)
> >>>>
> >>>>
> >>>> Procedure:
> >>>>
> >>>> 1. Start the Hbase services (HMaster & region Server) 2. Enable
> >>>> HFile
> >>>>
> >>>> encryption and WAL file encryption as below, and perform 'table4-
> 0'
> >>>>
> >>>> put operations (100 records added) <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider</name>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value
> >>>> >
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider.parameters</name>
> >>>>
> >>>>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@2
> >>>> 34
> >>>>
> >>>> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.master.key.name</name>
> >>>>
> >>>> <value>hdfs</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hfile.format.version</name>
> >>>>
> >>>> <value>3</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.hlog.reader.impl</name>
> >>>>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogRe
> >>>> ade
> >>>>
> >>>> r</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.hlog.writer.impl</name>
> >>>>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWr
> >>>> ite
> >>>>
> >>>> r</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.wal.encryption</name>
> >>>>
> >>>> <value>true</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> 3. Machine went down, so all process went down
> >>>>
> >>>>
> >>>> 4. We disabled the WAL file encryption for performance reason, and
> >>>>
> >>>> keep encryption only for Hfile, as below <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider</name>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value
> >>>> >
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider.parameters</name>
> >>>>
> >>>>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@2
> >>>> 34
> >>>>
> >>>> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.master.key.name</name>
> >>>>
> >>>> <value>hdfs</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hfile.format.version</name>
> >>>>
> >>>> <value>3</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> 5. Start the Region Server and query the 'table4-0' data
> >>>>
> >>>> hbase(main):003:0> count 'table4-0'
> >>>>
> >>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> >>>>
> >>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> >>>>
> >>>> online on
> >>>>
> >>>> XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncod
> >>>> edN
> >>>>
> >>>> ame(HRegionServer.java:2685)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegio
> >>>> nSe
> >>>>
> >>>> rver.java:4119)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> >>>>
> >>>> java:3066)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientServi
> >>>> ce$
> >>>>
> >>>> 2.callBlockingMethod(ClientProtos.java:29497)
> >>>>
> >>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> >>>>
> >>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleR
> >>>> pcS
> >>>>
> >>>> cheduler.java:168)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpc
> >>>> Sch
> >>>>
> >>>> eduler.java:39)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSched
> >>>> ule
> >>>>
> >>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
> >>>>
> >>>> 6. Not able to read the data, so we decided to revert back the
> >>>>
> >>>> configuration (as original) 7. Kill/Stop the Region Server, revert
> >>>> all
> >>>>
> >>>> the configurations as original, as below <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider</name>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value
> >>>> >
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider.parameters</name>
> >>>>
> >>>>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@2
> >>>> 34
> >>>>
> >>>> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.master.key.name</name>
> >>>>
> >>>> <value>hdfs</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hfile.format.version</name>
> >>>>
> >>>> <value>3</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.hlog.reader.impl</name>
> >>>>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogRe
> >>>> ade
> >>>>
> >>>> r</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.hlog.writer.impl</name>
> >>>>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWr
> >>>> ite
> >>>>
> >>>> r</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.wal.encryption</name>
> >>>>
> >>>> <value>true</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> 7. Start the Region Server, and perform the 'table4-0' query
> >>>>
> >>>> hbase(main):003:0> count 'table4-0'
> >>>>
> >>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> >>>>
> >>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> >>>>
> >>>> online on
> >>>>
> >>>> XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncod
> >>>> edN
> >>>>
> >>>> ame(HRegionServer.java:2685)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegio
> >>>> nSe
> >>>>
> >>>> rver.java:4119)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> >>>>
> >>>> java:3066)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientServi
> >>>> ce$
> >>>>
> >>>> 2.callBlockingMethod(ClientProtos.java:29497)
> >>>>
> >>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> >>>>
> >>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleR
> >>>> pcS
> >>>>
> >>>> cheduler.java:168)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpc
> >>>> Sch
> >>>>
> >>>> eduler.java:39)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSched
> >>>> ule
> >>>>
> >>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
> >>>>
> >>>> 8. Run the hbase hbck to repair, as below ./hbase hbck -details
> >>>>
> >>>> .........................
> >>>>
> >>>> Summary:
> >>>>
> >>>> table1-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table2-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table3-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table4-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table5-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table6-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table7-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table8-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table9-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> hbase:meta is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> hbase:namespace is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> 22 inconsistencies detected.
> >>>>
> >>>> Status: INCONSISTENT
> >>>>
> >>>> 2014-07-24 19:13:05,532 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> master
> >>>>
> >>>> protocol: MasterService
> >>>>
> >>>> 2014-07-24 19:13:05,533 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> >>>> zookeeper
> >>>>
> >>>> sessionid=0x1475d1611611bcf
> >>>>
> >>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
> >>> session:
> >>>>
> >>>> 0x1475d1611611bcf
> >>>>
> >>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
> >>>>
> >>>> client for session: 0x1475d1611611bcf
> >>>>
> >>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
> packet::
> >>>>
> >>>> clientPath:null serverPath:null finished:false header:: 6,-11
> >>> replyHeader::
> >>>>
> >>>> 6,4295102074,0 request:: null response:: null
> >>>>
> >>>> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
> >>>>
> >>>> Disconnecting client for session: 0x1475d1611611bcf
> >>>>
> >>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: An exception was thrown while closing send
> >>>>
> >>>> thread for session 0x1475d1611611bcf : Unable to read additional
> >>>> data
> >>>>
> >>>> from server sessionid 0x1475d1611611bcf, likely server has closed
> >>>>
> >>>> socket
> >>>>
> >>>> 2014-07-24 19:13:05,546 INFO [main-EventThread]
> zookeeper.ClientCnxn:
> >>>>
> >>>> EventThread shut down
> >>>>
> >>>> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> >>>>
> >>>> 0x1475d1611611bcf closed
> >>>>
> >>>> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> >>>>
> >>>> 9. Fix the assignments as below
> >>>>
> >>>> ./hbase hbck -fixAssignments
> >>>>
> >>>> Summary:
> >>>>
> >>>> table1-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table2-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table3-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table4-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table5-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table6-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table7-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table8-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table9-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is
> okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> 0 inconsistencies detected.
> >>>>
> >>>> Status: OK
> >>>>
> >>>> 2014-07-24 19:44:55,194 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> master
> >>>>
> >>>> protocol: MasterService
> >>>>
> >>>> 2014-07-24 19:44:55,194 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> >>>> zookeeper
> >>>>
> >>>> sessionid=0x2475d15f7b31b73
> >>>>
> >>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
> >>> session:
> >>>>
> >>>> 0x2475d15f7b31b73
> >>>>
> >>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
> >>>>
> >>>> client for session: 0x2475d15f7b31b73
> >>>>
> >>>> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
> packet::
> >>>>
> >>>> clientPath:null serverPath:null finished:false header:: 7,-11
> >>> replyHeader::
> >>>>
> >>>> 7,4295102377,0 request:: null response:: null
> >>>>
> >>>> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
> >>>>
> >>>> Disconnecting client for session: 0x2475d15f7b31b73
> >>>>
> >>>> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: An exception was thrown while closing send
> >>>>
> >>>> thread for session 0x2475d15f7b31b73 : Unable to read additional
> >>>> data
> >>>>
> >>>> from server sessionid 0x2475d15f7b31b73, likely server has closed
> >>>>
> >>>> socket
> >>>>
> >>>> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> >>>>
> >>>> 0x2475d15f7b31b73 closed
> >>>>
> >>>> 2014-07-24 19:44:55,204 INFO [main-EventThread]
> zookeeper.ClientCnxn:
> >>>>
> >>>> EventThread shut down
> >>>>
> >>>> 10. Fix the assignments as below
> >>>>
> >>>> ./hbase hbck -fixAssignments -fixMeta
> >>>>
> >>>> Summary:
> >>>>
> >>>> table1-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table2-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table3-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table4-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table5-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table6-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table7-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table8-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table9-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is
> okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> 0 inconsistencies detected.
> >>>>
> >>>> Status: OK
> >>>>
> >>>> 2014-07-24 19:46:16,290 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> master
> >>>>
> >>>> protocol: MasterService
> >>>>
> >>>> 2014-07-24 19:46:16,290 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> >>>> zookeeper
> >>>>
> >>>> sessionid=0x3475d1605321be9
> >>>>
> >>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
> >>> session:
> >>>>
> >>>> 0x3475d1605321be9
> >>>>
> >>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
> >>>>
> >>>> client for session: 0x3475d1605321be9
> >>>>
> >>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
> packet::
> >>>>
> >>>> clientPath:null serverPath:null finished:false header:: 6,-11
> >>> replyHeader::
> >>>>
> >>>> 6,4295102397,0 request:: null response:: null
> >>>>
> >>>> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
> >>>>
> >>>> Disconnecting client for session: 0x3475d1605321be9
> >>>>
> >>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: An exception was thrown while closing send
> >>>>
> >>>> thread for session 0x3475d1605321be9 : Unable to read additional
> >>>> data
> >>>>
> >>>> from server sessionid 0x3475d1605321be9, likely server has closed
> >>>>
> >>>> socket
> >>>>
> >>>> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> >>>>
> >>>> 0x3475d1605321be9 closed
> >>>>
> >>>> 2014-07-24 19:46:16,300 INFO [main-EventThread]
> zookeeper.ClientCnxn:
> >>>>
> >>>> EventThread shut down
> >>>>
> >>>> hbase(main):006:0> count 'table4-0'
> >>>>
> >>>> 0 row(s) in 0.0200 seconds
> >>>>
> >>>> => 0
> >>>>
> >>>> hbase(main):007:0>
> >>>>
> >>>> Complete data loss happened,
> >>>>
> >>>> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any
> >>>> data
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> [X]
> >>>>
> >>>> This e-mail and its attachments contain confidential information
> >>>> from
> >>>>
> >>>> HUAWEI, which is intended only for the person or entity whose
> >>>> address
> >>>>
> >>>> is listed above. Any use of the information contained herein in
> any
> >>>>
> >>>> way (including, but not limited to, total or partial disclosure,
> >>>>
> >>>> reproduction, or dissemination) by persons other than the intended
> >>>>
> >>>> recipient(s) is prohibited. If you receive this e-mail in error,
> >>>>
> >>>> please notify the sender by phone or email immediately and delete
> it!
> >>>>
> >>>> [X]
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Best regards,
> >>>>
> >>>>
> >>>> - Andy
> >>>>
> >>>>
> >>>> Problems worthy of attack prove their worth by hitting back. -
> Piet
> Hein
> >>>> (via Tom White)
> >>>
> >

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Anoop John <an...@gmail.com>.

Yes in btw the restart the config was changed. In steps the #4 was that.
 Wal encryption config is changed to false. Well that is ok but the reader
can not be changed. Because we dont find reader by looking at wal file meta
that whether this file is encrypted or not. Wal reading was this way with
user has to configure correct reader. So not sure whether any code change
needed or not.  Once the wal encryption was done, even after changing it
back to off the reader should continue to be SecureProtobufLogReader. (At
least till all existing wals are replayed)

And files moved to old logs but not corrupt folder is something tobe
checked. Any chance for a look there and patch Shankar?

Anoop


Anoop




On Sunday, July 27, 2014, Andrew Purtell <an...@gmail.com> wrote:
> So the regionserver configuration was changed after it crashed but before
it was restarted ?
>
> The impression given by the initial report is that simply using encrypted
WALs will cause data loss. That's not the case as I have confirmed. There
could be an edge case somewhere but the original reporter has left out
important detail about how to reproduce the problem. The below is not
written in clear language either, so I'm not following along. I'd be happy
to help look at this more once clear steps for reproducing the problem are
available. Otherwise since you're talking with Shankar somehow offline
already I'll leave you to it Anoop.
>
>> Also when the file can not be read, this is not moved under corrupt logs
is a concerning thing.  Need to look at that.
>
> Agreed.
>
>
>> On Jul 27, 2014, at 1:07 AM, Anoop John <an...@gmail.com> wrote:
>>
>> As per Shankar he can get things work with below configs
>>
>> <property>
>>        <name>hbase.regionserver.hlog.reader.impl</name>
>>
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>> </property>
>> <property>
>>        <name>hbase.regionserver.hlog.writer.impl</name>
>>
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>> </property>
>> <property>
>>        <name>hbase.regionserver.wal.encryption</name>
>>        <value>false</value>
>> </property>
>>
>> Once the RS crash happened,  the config is maintained above way. See that
>> WAL encryption is disabled now.  Still note that the reader is
>> SecureProtobufLogReader. The existing WAL files are with encryption and
>> only SecureProtobufLogReader can read them.  So if that is not
configured,
>> the default reader is. ProtobufLogReader  can not read them back
>> correctly.    So this is the issue that Shankar faced.
>>
>> Also when the file can not be read, this is not moved under corrupt logs
is
>> a concerning thing.  Need to look at that.
>>
>> -Anoop-
>>
>> On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <
andrew.purtell@gmail.com>
>> wrote:
>>
>>> My attempt to reproduce this issue:
>>>
>>> 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a
dev
>>> box.
>>>
>>> 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver
also on
>>> this dev box.
>>>
>>> 3. Set dfs.replication and
hbase.regionserver.hlog.tolerable.lowreplication
>>> to 1. Set up a keystore and enabled WAL encryption.
>>>
>>> 4. Created a test table.
>>>
>>> 5. Used YCSB to write 1000 rows to the test table. No flushes observed.
>>>
>>> 6. Used the shell to count the number of records in the test table.
Count =
>>> 1000 rows
>>>
>>> 7. kill -9 the regionserver process.
>>>
>>> 8. Started a new regionserver process. Observed log splitting and
replay in
>>> the regionserver log, no errors.
>>>
>>> 9. Used the shell to count the number of records in the test table.
Count =
>>> 1000 rows
>>>
>>> Tried this a few times.
>>>
>>> Shankar, can you try running through the above and let us know if the
>>> outcome is different?
>>>
>>>
>>>
>>> On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <
andrew.purtell@gmail.com>
>>> wrote:
>>>
>>>> Thanks for the detail. So to summarize:
>>>>
>>>> 0. HBase 0.98.3 and HDFS 2.4.1
>>>>
>>>> 1. All data before failure has not yet been flushed so only exists in
the
>>>> WAL files.
>>>>
>>>> 2. During distributed splitting, the WAL has either not been written
out
>>>> or is unreadable:
>>>>
>>>>
>>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> codec.BaseDecoder: Partial cell read caused by EOF:
java.io.IOException:
>>>> Premature EOF from inputStream
>>>>
>>>>
>>>> 3. This file is still moved to oldWALs even though splitting failed.
>>>>
>>>> 4. Setting 'hbase.regionserver.wal.encryption' to false allows for data
>>>> recovery in your scenario.
>>>>
>>>> See https://issues.apache.org/jira/browse/HBASE-11595
>>>>
>>>>
>>>>
>>>>
>>>> On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
>>> shankar.hiremath@huawei.com>
>>>> wrote:
>>>>
>>>>
>>>> Hi Andrew,
>>>>
>>>>
>>>> Please find the details
>>>>
>>>>
>>>> Hbase 0.98.3 & hadoop 2.4.1
>>>>
>>>> Hbase root file system on hdfs
>>>>
>>>>
>>>> On Hmaster side there is no failure or error message in the log file
>>>>
>>>> On Region Server side the the below error message reported as below
>>>>
>>>>
>>>> Region Server Log:
>>>>
>>>> 2014-07-26 19:29:15,904 DEBUG
[regionserver60020-SendThread(host2:2181)]
>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c,
packet::
>>>> clientPath:null serverPath:null finished:false header:: 172,4
>>>> replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F
>>>> response::
>>>
#ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
>>>>
>>>> 2014-07-26 19:29:15,905 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,905 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,905 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
>>>>
>>>>
>>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> codec.BaseDecoder: Partial cell read caused by EOF:
java.io.IOException:
>>>> Premature EOF from inputStream
>>>>
>>>>
>>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> wal.HLogSplitter: Finishing writing output logs and closing down.
>>>>
>>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> wal.HLogSplitter: Waiting for split writer threads to finish
>>>>
>>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> wal.HLogSplitter: Split writers finished
>>>>
>>>> 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> wal.HLogSplitter: Processed 0 edits across 0 regions; log
>>>
file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta
>>>> is corrupted = false progress failed = false
>>>>
>>>> 2014-07-26 19:29:16,184 DEBUG
[regionserver60020-SendThread(host2:2181)]
>>>> zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
>>>>
>>>>
>>>>
>>>> When I query the table data, which was in WAL files(before the
>>>> RegionServer machine went down) is not coming,
>>>>
>>>> One more thing what I observed is even when the WAL file not
successfully
>>>> processed then also it is moving to /oldWALs folder.
>>>>
>>>> So when I revert back the below 3 configuration in Region Server side
and
>>>> restart, since the WAL is already moved to oldWALS/ folder,
>>>>
>>>> So it will not get processed.
>>>>
>>>>
>>>> <property>
>>>>
>>>>   <name>hbase.regionserver.hlog.reader.impl</name>
>>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.hlog.writer.impl</name>
>>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.wal.encryption</name>
>>>>
>>>>  <value>true</value>
>>>>
>>>> </property>
>>>
-------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> And one more scenario I tried (Anoop suggested), with the below
>>>> configuration (instead of deleting the below 3 config paramters
>>>>
>>>> Kepp all but make only 'hbase.regionserver.wal.encryption=false') the
>>>> encrypted wal file is getting processed
>>>>
>>>> Successfully, and the query table is giving the WAL data (before the
>>>> RegionServer machine went down) correctly.
>>>>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.hlog.reader.impl</name>
>>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.hlog.writer.impl</name>
>>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.wal.encryption</name>
>>>>
>>>>  <value>false</value>
>>>>
>>>> </property>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> -Shankar
>>>>
>>>>
>>>> This e-mail and its attachments contain confidential information from
>>>> HUAWEI, which is intended only for the person or entity whose address
is
>>>> listed above. Any use of the information contained herein in any way
>>>> (including, but not limited to, total or partial disclosure,
>>> reproduction,
>>>> or dissemination) by persons other than the intended recipient(s) is
>>>> prohibited. If you receive this e-mail in error, please notify the
sender
>>>> by phone or email immediately and delete it!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>>
>>>> From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
>>>> <an...@gmail.com>] On Behalf Of Andrew Purtell
>>>>
>>>> Sent: 26 July 2014 AM 02:21
>>>>
>>>> To: user@hbase.apache.org
>>>>
>>>> Subject: Re: HBase file encryption, inconsistencies observed and data
>>> loss
>>>>
>>>>
>>>> Encryption (or the lack of it) doesn't explain missing HFiles.
>>>>
>>>>
>>>> Most likely if you are having a problem with encryption, this will
>>>> manifest as follows: HFiles will be present. However, you will find
many
>>>> IOExceptions in the regionserver logs as they attempt to open the
HFiles
>>>> but fail because the data is unreadable.
>>>>
>>>>
>>>> We should start by looking at more basic issues. What could explain the
>>>> total disappearance of HFiles.
>>>>
>>>>
>>>> Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on
>>>> the local filesystem (fs URL starts with file://)?
>>>>
>>>>
>>>> In your email you provide only exceptions printed by the client. What
>>> kind
>>>> of exceptions appear in the regionserver logs? Or appear in the master
>>> log?
>>>>
>>>> If the logs are large your best bet is to pastebin them and then send
the
>>>> URL to the paste in your response.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
>>>> shankar.hiremath@huawei.com> wrote:
>>>>
>>>>
>>>> HBase file encryption some inconsistencies observed and data loss
>>>>
>>>> happens after running the hbck tool,
>>>>
>>>> the operation steps are as below.    (one thing what I observed is, on
>>>>
>>>> startup of HMaster if it is not able to process the WAL file, then
>>>>
>>>> also it moved to /oldWALs)
>>>>
>>>>
>>>> Procedure:
>>>>
>>>> 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
>>>>
>>>> encryption and WAL file encryption as below, and perform 'table4-0'
>>>>
>>>> put operations (100 records added) <property>
>>>>
>>>> <name>hbase.crypto.keyprovider</name>
>>>>
>>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.keyprovider.parameters</name>
>>>>
>>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>>>
>>>> </value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.master.key.name</name>
>>>>
>>>> <value>hdfs</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hfile.format.version</name>
>>>>
>>>> <value>3</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.hlog.reader.impl</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>>>>
>>>> r</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.hlog.writer.impl</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>>>>
>>>> r</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.wal.encryption</name>
>>>>
>>>> <value>true</value>
>>>>
>>>> </property>
>>>>
>>>> 3. Machine went down, so all process went down
>>>>
>>>>
>>>> 4. We disabled the WAL file encryption for performance reason, and
>>>>
>>>> keep encryption only for Hfile, as below <property>
>>>>
>>>> <name>hbase.crypto.keyprovider</name>
>>>>
>>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.keyprovider.parameters</name>
>>>>
>>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>>>
>>>> </value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.master.key.name</name>
>>>>
>>>> <value>hdfs</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hfile.format.version</name>
>>>>
>>>> <value>3</value>
>>>>
>>>> </property>
>>>>
>>>> 5. Start the Region Server and query the 'table4-0' data
>>>>
>>>> hbase(main):003:0> count 'table4-0'
>>>>
>>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>>>>
>>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>>>>
>>>> online on
>>>>
>>>> XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>>>>
>>>> ame(HRegionServer.java:2685)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>>>>
>>>> rver.java:4119)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>>>>
>>>> java:3066)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>>>>
>>>> 2.callBlockingMethod(ClientProtos.java:29497)
>>>>
>>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>>>>
>>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>>>>
>>>> cheduler.java:168)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>>>>
>>>> eduler.java:39)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>>>>
>>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>>>>
>>>> 6. Not able to read the data, so we decided to revert back the
>>>>
>>>> configuration (as original) 7. Kill/Stop the Region Server, revert all
>>>>
>>>> the configurations as original, as below <property>
>>>>
>>>> <name>hbase.crypto.keyprovider</name>
>>>>
>>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.keyprovider.parameters</name>
>>>>
>>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>>>
>>>> </value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.master.key.name</name>
>>>>
>>>> <value>hdfs</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hfile.format.version</name>
>>>>
>>>> <value>3</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.hlog.reader.impl</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>>>>
>>>> r</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.hlog.writer.impl</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>>>>
>>>> r</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.wal.encryption</name>
>>>>
>>>> <value>true</value>
>>>>
>>>> </property>
>>>>
>>>> 7. Start the Region Server, and perform the 'table4-0' query
>>>>
>>>> hbase(main):003:0> count 'table4-0'
>>>>
>>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>>>>
>>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>>>>
>>>> online on
>>>>
>>>> XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>>>>
>>>> ame(HRegionServer.java:2685)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>>>>
>>>> rver.java:4119)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>>>>
>>>> java:3066)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>>>>
>>>> 2.callBlockingMethod(ClientProtos.java:29497)
>>>>
>>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>>>>
>>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>>>>
>>>> cheduler.java:168)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>>>>
>>>> eduler.java:39)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>>>>
>>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>>>>
>>>> 8. Run the hbase hbck to repair, as below ./hbase hbck -details
>>>>
>>>> .........................
>>>>
>>>> Summary:
>>>>
>>>> table1-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table2-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table3-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table4-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table5-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table6-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table7-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table8-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table9-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> hbase:meta is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> hbase:namespace is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> 22 inconsistencies detected.
>>>>
>>>> Status: INCONSISTENT
>>>>
>>>> 2014-07-24 19:13:05,532 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>>>
>>>> protocol: MasterService
>>>>
>>>> 2014-07-24 19:13:05,533 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>>>
>>>> sessionid=0x1475d1611611bcf
>>>>
>>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
>>> session:
>>>>
>>>> 0x1475d1611611bcf
>>>>
>>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
>>>>
>>>> client for session: 0x1475d1611611bcf
>>>>
>>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
packet::
>>>>
>>>> clientPath:null serverPath:null finished:false header:: 6,-11
>>> replyHeader::
>>>>
>>>> 6,4295102074,0 request:: null response:: null
>>>>
>>>> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
>>>>
>>>> Disconnecting client for session: 0x1475d1611611bcf
>>>>
>>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>>>
>>>> thread for session 0x1475d1611611bcf : Unable to read additional data
>>>>
>>>> from server sessionid 0x1475d1611611bcf, likely server has closed
>>>>
>>>> socket
>>>>
>>>> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>>
>>>> EventThread shut down
>>>>
>>>> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
>>>>
>>>> 0x1475d1611611bcf closed
>>>>
>>>> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
>>>>
>>>> 9. Fix the assignments as below
>>>>
>>>> ./hbase hbck -fixAssignments
>>>>
>>>> Summary:
>>>>
>>>> table1-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table2-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table3-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table4-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table5-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table6-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table7-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table8-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table9-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> 0 inconsistencies detected.
>>>>
>>>> Status: OK
>>>>
>>>> 2014-07-24 19:44:55,194 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>>>
>>>> protocol: MasterService
>>>>
>>>> 2014-07-24 19:44:55,194 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>>>
>>>> sessionid=0x2475d15f7b31b73
>>>>
>>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
>>> session:
>>>>
>>>> 0x2475d15f7b31b73
>>>>
>>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
>>>>
>>>> client for session: 0x2475d15f7b31b73
>>>>
>>>> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
packet::
>>>>
>>>> clientPath:null serverPath:null finished:false header:: 7,-11
>>> replyHeader::
>>>>
>>>> 7,4295102377,0 request:: null response:: null
>>>>
>>>> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
>>>>
>>>> Disconnecting client for session: 0x2475d15f7b31b73
>>>>
>>>> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>>>
>>>> thread for session 0x2475d15f7b31b73 : Unable to read additional data
>>>>
>>>> from server sessionid 0x2475d15f7b31b73, likely server has closed
>>>>
>>>> socket
>>>>
>>>> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
>>>>
>>>> 0x2475d15f7b31b73 closed
>>>>
>>>> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>>
>>>> EventThread shut down
>>>>
>>>> 10. Fix the assignments as below
>>>>
>>>> ./hbase hbck -fixAssignments -fixMeta
>>>>
>>>> Summary:
>>>>
>>>> table1-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table2-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table3-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table4-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table5-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table6-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table7-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table8-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table9-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> 0 inconsistencies detected.
>>>>
>>>> Status: OK
>>>>
>>>> 2014-07-24 19:46:16,290 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>>>
>>>> protocol: MasterService
>>>>
>>>> 2014-07-24 19:46:16,290 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>>>
>>>> sessionid=0x3475d1605321be9
>>>>
>>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
>>> session:
>>>>
>>>> 0x3475d1605321be9
>>>>
>>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
>>>>
>>>> client for session: 0x3475d1605321be9
>>>>
>>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
packet::
>>>>
>>>> clientPath:null serverPath:null finished:false header:: 6,-11
>>> replyHeader::
>>>>
>>>> 6,4295102397,0 request:: null response:: null
>>>>
>>>> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
>>>>
>>>> Disconnecting client for session: 0x3475d1605321be9
>>>>
>>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>>>
>>>> thread for session 0x3475d1605321be9 : Unable to read additional data
>>>>
>>>> from server sessionid 0x3475d1605321be9, likely server has closed
>>>>
>>>> socket
>>>>
>>>> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
>>>>
>>>> 0x3475d1605321be9 closed
>>>>
>>>> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>>
>>>> EventThread shut down
>>>>
>>>> hbase(main):006:0> count 'table4-0'
>>>>
>>>> 0 row(s) in 0.0200 seconds
>>>>
>>>> => 0
>>>>
>>>> hbase(main):007:0>
>>>>
>>>> Complete data loss happened,
>>>>
>>>> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>>>>
>>>>
>>>>
>>>>
>>>> [X]
>>>>
>>>> This e-mail and its attachments contain confidential information from
>>>>
>>>> HUAWEI, which is intended only for the person or entity whose address
>>>>
>>>> is listed above. Any use of the information contained herein in any
>>>>
>>>> way (including, but not limited to, total or partial disclosure,
>>>>
>>>> reproduction, or dissemination) by persons other than the intended
>>>>
>>>> recipient(s) is prohibited. If you receive this e-mail in error,
>>>>
>>>> please notify the sender by phone or email immediately and delete it!
>>>>
>>>> [X]
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> - Andy
>>>>
>>>>
>>>> Problems worthy of attack prove their worth by hitting back. - Piet
Hein
>>>> (via Tom White)
>>>
>

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Andrew Purtell <an...@gmail.com>.

So the regionserver configuration was changed after it crashed but before it was restarted ? 

The impression given by the initial report is that simply using encrypted WALs will cause data loss. That's not the case as I have confirmed. There could be an edge case somewhere but the original reporter has left out important detail about how to reproduce the problem. The below is not written in clear language either, so I'm not following along. I'd be happy to help look at this more once clear steps for reproducing the problem are available. Otherwise since you're talking with Shankar somehow offline already I'll leave you to it Anoop. 

> Also when the file can not be read, this is not moved under corrupt logs is a concerning thing.  Need to look at that.

Agreed. 


> On Jul 27, 2014, at 1:07 AM, Anoop John <an...@gmail.com> wrote:
> 
> As per Shankar he can get things work with below configs
> 
> <property>
>        <name>hbase.regionserver.hlog.reader.impl</name>
> 
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
>        <name>hbase.regionserver.hlog.writer.impl</name>
> 
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
>        <name>hbase.regionserver.wal.encryption</name>
>        <value>false</value>
> </property>
> 
> Once the RS crash happened,  the config is maintained above way. See that
> WAL encryption is disabled now.  Still note that the reader is
> SecureProtobufLogReader. The existing WAL files are with encryption and
> only SecureProtobufLogReader can read them.  So if that is not configured,
> the default reader is. ProtobufLogReader  can not read them back
> correctly.    So this is the issue that Shankar faced.
> 
> Also when the file can not be read, this is not moved under corrupt logs is
> a concerning thing.  Need to look at that.
> 
> -Anoop-
> 
> On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <an...@gmail.com>
> wrote:
> 
>> My attempt to reproduce this issue:
>> 
>> 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a dev
>> box.
>> 
>> 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver also on
>> this dev box.
>> 
>> 3. Set dfs.replication and hbase.regionserver.hlog.tolerable.lowreplication
>> to 1. Set up a keystore and enabled WAL encryption.
>> 
>> 4. Created a test table.
>> 
>> 5. Used YCSB to write 1000 rows to the test table. No flushes observed.
>> 
>> 6. Used the shell to count the number of records in the test table. Count =
>> 1000 rows
>> 
>> 7. kill -9 the regionserver process.
>> 
>> 8. Started a new regionserver process. Observed log splitting and replay in
>> the regionserver log, no errors.
>> 
>> 9. Used the shell to count the number of records in the test table. Count =
>> 1000 rows
>> 
>> Tried this a few times.
>> 
>> Shankar, can you try running through the above and let us know if the
>> outcome is different?
>> 
>> 
>> 
>> On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <an...@gmail.com>
>> wrote:
>> 
>>> Thanks for the detail. So to summarize:
>>> 
>>> 0. HBase 0.98.3 and HDFS 2.4.1
>>> 
>>> 1. All data before failure has not yet been flushed so only exists in the
>>> WAL files.
>>> 
>>> 2. During distributed splitting, the WAL has either not been written out
>>> or is unreadable:
>>> 
>>> 
>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>>> codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
>>> Premature EOF from inputStream
>>> 
>>> 
>>> 3. This file is still moved to oldWALs even though splitting failed.
>>> 
>>> 4. Setting 'hbase.regionserver.wal.encryption' to false allows for data
>>> recovery in your scenario.
>>> 
>>> See https://issues.apache.org/jira/browse/HBASE-11595
>>> 
>>> 
>>> 
>>> 
>>> On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
>> shankar.hiremath@huawei.com>
>>> wrote:
>>> 
>>> 
>>> Hi Andrew,
>>> 
>>> 
>>> Please find the details
>>> 
>>> 
>>> Hbase 0.98.3 & hadoop 2.4.1
>>> 
>>> Hbase root file system on hdfs
>>> 
>>> 
>>> On Hmaster side there is no failure or error message in the log file
>>> 
>>> On Region Server side the the below error message reported as below
>>> 
>>> 
>>> Region Server Log:
>>> 
>>> 2014-07-26 19:29:15,904 DEBUG [regionserver60020-SendThread(host2:2181)]
>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c, packet::
>>> clientPath:null serverPath:null finished:false header:: 172,4
>>> replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F
>>> response::
>> #ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
>>> 
>>> 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
>>> 
>>> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
>>> wal.HLogSplitter: Writer thread
>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
>>> 
>>> 
>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>>> codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
>>> Premature EOF from inputStream
>>> 
>>> 
>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>> wal.HLogSplitter: Finishing writing output logs and closing down.
>>> 
>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>> wal.HLogSplitter: Waiting for split writer threads to finish
>>> 
>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>> wal.HLogSplitter: Split writers finished
>>> 
>>> 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>> wal.HLogSplitter: Processed 0 edits across 0 regions; log
>> file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta
>>> is corrupted = false progress failed = false
>>> 
>>> 2014-07-26 19:29:16,184 DEBUG [regionserver60020-SendThread(host2:2181)]
>>> zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
>>> 
>>> 
>>> 
>>> When I query the table data, which was in WAL files(before the
>>> RegionServer machine went down) is not coming,
>>> 
>>> One more thing what I observed is even when the WAL file not successfully
>>> processed then also it is moving to /oldWALs folder.
>>> 
>>> So when I revert back the below 3 configuration in Region Server side and
>>> restart, since the WAL is already moved to oldWALS/ folder,
>>> 
>>> So it will not get processed.
>>> 
>>> 
>>> <property>
>>> 
>>>   <name>hbase.regionserver.hlog.reader.impl</name>
>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>>  <name>hbase.regionserver.hlog.writer.impl</name>
>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>>  <name>hbase.regionserver.wal.encryption</name>
>>> 
>>>  <value>true</value>
>>> 
>>> </property>
>> -------------------------------------------------------------------------------------------------------------
>>> 
>>> 
>>> And one more scenario I tried (Anoop suggested), with the below
>>> configuration (instead of deleting the below 3 config paramters
>>> 
>>> Kepp all but make only 'hbase.regionserver.wal.encryption=false') the
>>> encrypted wal file is getting processed
>>> 
>>> Successfully, and the query table is giving the WAL data (before the
>>> RegionServer machine went down) correctly.
>>> 
>>> 
>>> <property>
>>> 
>>>  <name>hbase.regionserver.hlog.reader.impl</name>
>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>>  <name>hbase.regionserver.hlog.writer.impl</name>
>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>>  <name>hbase.regionserver.wal.encryption</name>
>>> 
>>>  <value>false</value>
>>> 
>>> </property>
>>> 
>>> 
>>> 
>>> Regards
>>> 
>>> -Shankar
>>> 
>>> 
>>> This e-mail and its attachments contain confidential information from
>>> HUAWEI, which is intended only for the person or entity whose address is
>>> listed above. Any use of the information contained herein in any way
>>> (including, but not limited to, total or partial disclosure,
>> reproduction,
>>> or dissemination) by persons other than the intended recipient(s) is
>>> prohibited. If you receive this e-mail in error, please notify the sender
>>> by phone or email immediately and delete it!
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> 
>>> From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
>>> <an...@gmail.com>] On Behalf Of Andrew Purtell
>>> 
>>> Sent: 26 July 2014 AM 02:21
>>> 
>>> To: user@hbase.apache.org
>>> 
>>> Subject: Re: HBase file encryption, inconsistencies observed and data
>> loss
>>> 
>>> 
>>> Encryption (or the lack of it) doesn't explain missing HFiles.
>>> 
>>> 
>>> Most likely if you are having a problem with encryption, this will
>>> manifest as follows: HFiles will be present. However, you will find many
>>> IOExceptions in the regionserver logs as they attempt to open the HFiles
>>> but fail because the data is unreadable.
>>> 
>>> 
>>> We should start by looking at more basic issues. What could explain the
>>> total disappearance of HFiles.
>>> 
>>> 
>>> Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on
>>> the local filesystem (fs URL starts with file://)?
>>> 
>>> 
>>> In your email you provide only exceptions printed by the client. What
>> kind
>>> of exceptions appear in the regionserver logs? Or appear in the master
>> log?
>>> 
>>> If the logs are large your best bet is to pastebin them and then send the
>>> URL to the paste in your response.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
>>> shankar.hiremath@huawei.com> wrote:
>>> 
>>> 
>>> HBase file encryption some inconsistencies observed and data loss
>>> 
>>> happens after running the hbck tool,
>>> 
>>> the operation steps are as below.    (one thing what I observed is, on
>>> 
>>> startup of HMaster if it is not able to process the WAL file, then
>>> 
>>> also it moved to /oldWALs)
>>> 
>>> 
>>> Procedure:
>>> 
>>> 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
>>> 
>>> encryption and WAL file encryption as below, and perform 'table4-0'
>>> 
>>> put operations (100 records added) <property>
>>> 
>>> <name>hbase.crypto.keyprovider</name>
>>> 
>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.crypto.keyprovider.parameters</name>
>>> 
>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>> 
>>> </value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.crypto.master.key.name</name>
>>> 
>>> <value>hdfs</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hfile.format.version</name>
>>> 
>>> <value>3</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.regionserver.hlog.reader.impl</name>
>>> 
>>> 
>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>>> 
>>> r</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.regionserver.hlog.writer.impl</name>
>>> 
>>> 
>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>>> 
>>> r</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.regionserver.wal.encryption</name>
>>> 
>>> <value>true</value>
>>> 
>>> </property>
>>> 
>>> 3. Machine went down, so all process went down
>>> 
>>> 
>>> 4. We disabled the WAL file encryption for performance reason, and
>>> 
>>> keep encryption only for Hfile, as below <property>
>>> 
>>> <name>hbase.crypto.keyprovider</name>
>>> 
>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.crypto.keyprovider.parameters</name>
>>> 
>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>> 
>>> </value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.crypto.master.key.name</name>
>>> 
>>> <value>hdfs</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hfile.format.version</name>
>>> 
>>> <value>3</value>
>>> 
>>> </property>
>>> 
>>> 5. Start the Region Server and query the 'table4-0' data
>>> 
>>> hbase(main):003:0> count 'table4-0'
>>> 
>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>>> 
>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>>> 
>>> online on
>>> 
>>> XX-XX-XX-XX,60020,1406209023146
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>>> 
>>> ame(HRegionServer.java:2685)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>>> 
>>> rver.java:4119)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>>> 
>>> java:3066)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>>> 
>>> 2.callBlockingMethod(ClientProtos.java:29497)
>>> 
>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>>> 
>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>>> 
>>> cheduler.java:168)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>>> 
>>> eduler.java:39)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>>> 
>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>>> 
>>> 6. Not able to read the data, so we decided to revert back the
>>> 
>>> configuration (as original) 7. Kill/Stop the Region Server, revert all
>>> 
>>> the configurations as original, as below <property>
>>> 
>>> <name>hbase.crypto.keyprovider</name>
>>> 
>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.crypto.keyprovider.parameters</name>
>>> 
>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>> 
>>> </value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.crypto.master.key.name</name>
>>> 
>>> <value>hdfs</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hfile.format.version</name>
>>> 
>>> <value>3</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.regionserver.hlog.reader.impl</name>
>>> 
>>> 
>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>>> 
>>> r</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.regionserver.hlog.writer.impl</name>
>>> 
>>> 
>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>>> 
>>> r</value>
>>> 
>>> </property>
>>> 
>>> <property>
>>> 
>>> <name>hbase.regionserver.wal.encryption</name>
>>> 
>>> <value>true</value>
>>> 
>>> </property>
>>> 
>>> 7. Start the Region Server, and perform the 'table4-0' query
>>> 
>>> hbase(main):003:0> count 'table4-0'
>>> 
>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>>> 
>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>>> 
>>> online on
>>> 
>>> XX-XX-XX-XX,60020,1406209023146
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>>> 
>>> ame(HRegionServer.java:2685)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>>> 
>>> rver.java:4119)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>>> 
>>> java:3066)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>>> 
>>> 2.callBlockingMethod(ClientProtos.java:29497)
>>> 
>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>>> 
>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>>> 
>>> cheduler.java:168)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>>> 
>>> eduler.java:39)
>>> 
>>> at
>>> 
>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>>> 
>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>>> 
>>> 8. Run the hbase hbck to repair, as below ./hbase hbck -details
>>> 
>>> .........................
>>> 
>>> Summary:
>>> 
>>> table1-0 is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> table2-0 is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> table3-0 is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> table4-0 is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> table5-0 is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> table6-0 is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> table7-0 is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> table8-0 is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> table9-0 is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> hbase:meta is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> hbase:namespace is okay.
>>> 
>>> Number of regions: 0
>>> 
>>> Deployed on:
>>> 
>>> 22 inconsistencies detected.
>>> 
>>> Status: INCONSISTENT
>>> 
>>> 2014-07-24 19:13:05,532 INFO [main]
>>> 
>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>> 
>>> protocol: MasterService
>>> 
>>> 2014-07-24 19:13:05,533 INFO [main]
>>> 
>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>> 
>>> sessionid=0x1475d1611611bcf
>>> 
>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
>> session:
>>> 
>>> 0x1475d1611611bcf
>>> 
>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
>>> 
>>> client for session: 0x1475d1611611bcf
>>> 
>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>> 
>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf, packet::
>>> 
>>> clientPath:null serverPath:null finished:false header:: 6,-11
>> replyHeader::
>>> 
>>> 6,4295102074,0 request:: null response:: null
>>> 
>>> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
>>> 
>>> Disconnecting client for session: 0x1475d1611611bcf
>>> 
>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>> 
>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>> 
>>> thread for session 0x1475d1611611bcf : Unable to read additional data
>>> 
>>> from server sessionid 0x1475d1611611bcf, likely server has closed
>>> 
>>> socket
>>> 
>>> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
>>> 
>>> EventThread shut down
>>> 
>>> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
>>> 
>>> 0x1475d1611611bcf closed
>>> 
>>> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
>>> 
>>> 9. Fix the assignments as below
>>> 
>>> ./hbase hbck -fixAssignments
>>> 
>>> Summary:
>>> 
>>> table1-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table2-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table3-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table4-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table5-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table6-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table7-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table8-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table9-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> 0 inconsistencies detected.
>>> 
>>> Status: OK
>>> 
>>> 2014-07-24 19:44:55,194 INFO [main]
>>> 
>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>> 
>>> protocol: MasterService
>>> 
>>> 2014-07-24 19:44:55,194 INFO [main]
>>> 
>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>> 
>>> sessionid=0x2475d15f7b31b73
>>> 
>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
>> session:
>>> 
>>> 0x2475d15f7b31b73
>>> 
>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
>>> 
>>> client for session: 0x2475d15f7b31b73
>>> 
>>> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>> 
>>> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73, packet::
>>> 
>>> clientPath:null serverPath:null finished:false header:: 7,-11
>> replyHeader::
>>> 
>>> 7,4295102377,0 request:: null response:: null
>>> 
>>> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
>>> 
>>> Disconnecting client for session: 0x2475d15f7b31b73
>>> 
>>> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>> 
>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>> 
>>> thread for session 0x2475d15f7b31b73 : Unable to read additional data
>>> 
>>> from server sessionid 0x2475d15f7b31b73, likely server has closed
>>> 
>>> socket
>>> 
>>> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
>>> 
>>> 0x2475d15f7b31b73 closed
>>> 
>>> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
>>> 
>>> EventThread shut down
>>> 
>>> 10. Fix the assignments as below
>>> 
>>> ./hbase hbck -fixAssignments -fixMeta
>>> 
>>> Summary:
>>> 
>>> table1-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table2-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table3-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table4-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table5-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table6-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table7-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table8-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> table9-0 is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>>> 
>>> Number of regions: 1
>>> 
>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>> 
>>> 0 inconsistencies detected.
>>> 
>>> Status: OK
>>> 
>>> 2014-07-24 19:46:16,290 INFO [main]
>>> 
>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>> 
>>> protocol: MasterService
>>> 
>>> 2014-07-24 19:46:16,290 INFO [main]
>>> 
>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>> 
>>> sessionid=0x3475d1605321be9
>>> 
>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
>> session:
>>> 
>>> 0x3475d1605321be9
>>> 
>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
>>> 
>>> client for session: 0x3475d1605321be9
>>> 
>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>> 
>>> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9, packet::
>>> 
>>> clientPath:null serverPath:null finished:false header:: 6,-11
>> replyHeader::
>>> 
>>> 6,4295102397,0 request:: null response:: null
>>> 
>>> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
>>> 
>>> Disconnecting client for session: 0x3475d1605321be9
>>> 
>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>> 
>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>> 
>>> thread for session 0x3475d1605321be9 : Unable to read additional data
>>> 
>>> from server sessionid 0x3475d1605321be9, likely server has closed
>>> 
>>> socket
>>> 
>>> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
>>> 
>>> 0x3475d1605321be9 closed
>>> 
>>> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
>>> 
>>> EventThread shut down
>>> 
>>> hbase(main):006:0> count 'table4-0'
>>> 
>>> 0 row(s) in 0.0200 seconds
>>> 
>>> => 0
>>> 
>>> hbase(main):007:0>
>>> 
>>> Complete data loss happened,
>>> 
>>> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>>> 
>>> 
>>> 
>>> 
>>> [X]
>>> 
>>> This e-mail and its attachments contain confidential information from
>>> 
>>> HUAWEI, which is intended only for the person or entity whose address
>>> 
>>> is listed above. Any use of the information contained herein in any
>>> 
>>> way (including, but not limited to, total or partial disclosure,
>>> 
>>> reproduction, or dissemination) by persons other than the intended
>>> 
>>> recipient(s) is prohibited. If you receive this e-mail in error,
>>> 
>>> please notify the sender by phone or email immediately and delete it!
>>> 
>>> [X]
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best regards,
>>> 
>>> 
>>> - Andy
>>> 
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Anoop John <an...@gmail.com>.

SecureProtobufLogReader can read encrypted as well as unencrypted files.

Anoop

On Sunday, July 27, 2014, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:
> I think in the above case though encryption is disabled we will need to
use
> the securelogreader  only for the new files also that will be created? I
> don have code with me now. But if that is the case need to see it as I
feel
> only the existing one should be read with securelogreader. The new wal
> should be read using log reader.
> Moving to corrupt folder is fine unless we could bring it back to the main
> working for.
> Sent from mobile excuse any typos.
> On Jul 27, 2014 10:07 AM, "Anoop John" <an...@gmail.com> wrote:
>
>> As per Shankar he can get things work with below configs
>>
>> <property>
>>         <name>hbase.regionserver.hlog.reader.impl</name>
>>
>>
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>> </property>
>> <property>
>>         <name>hbase.regionserver.hlog.writer.impl</name>
>>
>>
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>> </property>
>> <property>
>>         <name>hbase.regionserver.wal.encryption</name>
>>         <value>false</value>
>> </property>
>>
>> Once the RS crash happened,  the config is maintained above way. See that
>> WAL encryption is disabled now.  Still note that the reader is
>> SecureProtobufLogReader. The existing WAL files are with encryption and
>> only SecureProtobufLogReader can read them.  So if that is not
configured,
>> the default reader is. ProtobufLogReader  can not read them back
>> correctly.    So this is the issue that Shankar faced.
>>
>> Also when the file can not be read, this is not moved under corrupt logs
is
>> a concerning thing.  Need to look at that.
>>
>> -Anoop-
>>
>> On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <
andrew.purtell@gmail.com
>> >
>> wrote:
>>
>> > My attempt to reproduce this issue:
>> >
>> > 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a
dev
>> > box.
>> >
>> > 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver
also
>> on
>> > this dev box.
>> >
>> > 3. Set dfs.replication and
>> hbase.regionserver.hlog.tolerable.lowreplication
>> > to 1. Set up a keystore and enabled WAL encryption.
>> >
>> > 4. Created a test table.
>> >
>> > 5. Used YCSB to write 1000 rows to the test table. No flushes observed.
>> >
>> > 6. Used the shell to count the number of records in the test table.
>> Count =
>> > 1000 rows
>> >
>> > 7. kill -9 the regionserver process.
>> >
>> > 8. Started a new regionserver process. Observed log splitting and
replay
>> in
>> > the regionserver log, no errors.
>> >
>> > 9. Used the shell to count the number of records in the test table.
>> Count =
>> > 1000 rows
>> >
>> > Tried this a few times.
>> >
>> > Shankar, can you try running through the above and let us know if the
>> > outcome is different?
>> >
>> >
>> >
>> > On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <
>> andrew.purtell@gmail.com>
>> > wrote:
>> >
>> > > Thanks for the detail. So to summarize:
>> > >
>> > > 0. HBase 0.98.3 and HDFS 2.4.1
>> > >
>> > > 1. All data before failure has not yet been flushed so only exists in
>> the
>> > > WAL files.
>> > >
>> > > 2. During distributed splitting, the WAL has either not been written
>> out
>> > > or is unreadable:
>> > >
>> > >
>> > > 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > codec.BaseDecoder: Partial cell read caused by EOF:
>> java.io.IOException:
>> > > Premature EOF from inputStream
>> > >
>> > >
>> > > 3. This file is still moved to oldWALs even though splitting failed.
>> > >
>> > > 4. Setting 'hbase.regionserver.wal.encryption' to false allows for
data
>> > > recovery in your scenario.
>> > >
>> > > See https://issues.apache.org/jira/browse/HBASE-11595
>> > >
>> > >
>> > >
>> > >
>> > > On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
>> > shankar.hiremath@huawei.com>
>> > > wrote:
>> > >
>> > >
>> > > Hi Andrew,
>> > >
>> > >
>> > > Please find the details
>> > >
>> > >
>> > > Hbase 0.98.3 & hadoop 2.4.1
>> > >
>> > > Hbase root file system on hdfs
>> > >
>> > >
>> > > On Hmaster side there is no failure or error message in the log file
>> > >
>> > > On Region Server side the the below error message reported as below
>> > >
>> > >
>> > > Region Server Log:
>> > >
>> > > 2014-07-26 19:29:15,904 DEBUG
>> [regionserver60020-SendThread(host2:2181)]
>> > > zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c,
>> packet::
>> > > clientPath:null serverPath:null finished:false header:: 172,4
>> > >  replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F
>> > >  response::
>> > >
>> >
>>
#ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
>> > >
>> > > 2014-07-26 19:29:15,905 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,905 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,905 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
>> > >
>> > >
>> > > 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > codec.BaseDecoder: Partial cell read caused by EOF:
>> java.io.IOException:
>> > > Premature EOF from inputStream
>> > >
>> > >
>> > > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > wal.HLogSplitter: Finishing writing output logs and closing down.
>> > >
>> > > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > wal.HLogSplitter: Waiting for split writer threads to finish
>> > >
>> > > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > wal.HLogSplitter: Split writers finished
>> > >
>> > > 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > wal.HLogSplitter: Processed 0 edits across 0 regions; log
>> > >
>> >
>>
file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta
>> > > is corrupted = false progress failed = false
>> > >
>> > > 2014-07-26 19:29:16,184 DEBUG
>> [regionserver60020-SendThread(host2:2181)]
>> > > zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
>> > >
>> > >
>> > >
>> > > When I query the table data, which was in WAL files(before the
>> > > RegionServer machine went down) is not coming,
>> > >
>> > > One more thing what I observed is even when the WAL file not
>> successfully
>> > > processed then also it is moving to /oldWALs folder.
>> > >
>> > > So when I revert back the below 3 configuration in Region Server side
>> and
>> > > restart, since the WAL is already moved to oldWALS/ folder,
>> > >
>> > > So it will not get processed.
>> > >
>> > >
>> > > <property>
>> > >
>> > >    <name>hbase.regionserver.hlog.reader.impl</name>
>> > >
>> > >
>> > >
>> >
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.hlog.writer.impl</name>
>> > >
>> > >
>> > >
>> >
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.wal.encryption</name>
>> > >
>> > >   <value>true</value>
>> > >
>> > > </property>
>> > >
>> > >
>> > >
>> > >
>> >
>>
-------------------------------------------------------------------------------------------------------------
>> > >
>> > >
>> > > And one more scenario I tried (Anoop suggested), with the below
>> > > configuration (instead of deleting the below 3 config paramters
>> > >
>> > > Kepp all but make only 'hbase.regionserver.wal.encryption=false') the
>> > > encrypted wal file is getting processed
>> > >
>> > > Successfully, and the query table is giving the WAL data (before the
>> > > RegionServer machine went down) correctly.
>> > >
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.hlog.reader.impl</name>
>> > >
>> > >
>> > >
>> >
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.hlog.writer.impl</name>
>> > >
>> > >
>> > >
>> >
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.wal.encryption</name>
>> > >
>> > >   <value>false</value>
>> > >
>> > > </property>
>> > >
>> > >
>> > >
>> > > Regards
>> > >
>> > > -Shankar
>> > >
>> > >
>> > > This e-mail and its attachments contain confidential information from
>> > > HUAWEI, which is intended only for the person or entity whose address
>> is
>> > > listed above. Any use of the information contained herein in any way
>> > > (including, but not limited to, total or partial disclosure,
>> > reproduction,
>> > > or dissemination) by persons other than the intended recipient(s) is
>> > > prohibited. If you receive this e-mail in error, please notify the
>> sender
>> > > by phone or email immediately and delete it!
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > -----Original Message-----
>> > >
>> > > From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
>> > > <an...@gmail.com>] On Behalf Of Andrew Purtell
>> > >
>> > > Sent: 26 July 2014 AM 02:21
>> > >
>> > > To: user@hbase.apache.org
>> > >
>> > > Subject: Re: HBase file encryption, inconsistencies observed and data
>> > loss
>> > >
>> > >
>> > > Encryption (or the lack of it) doesn't explain missing HFiles.
>> > >
>> > >
>> > > Most likely if you are having a problem with encryption, this will
>> > > manifest as follows: HFiles will be present. However, you will find
>> many
>> > > IOExceptions in the regionserver logs as they attempt to open the
>> HFiles
>> > > but fail because the data is unreadable.
>> > >
>> > >
>> > > We should start by looking at more basic issues. What could explain
the
>> > > total disappearance of HFiles.
>> > >
>> > >
>> > > Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or
on
>> > > the local filesystem (fs URL starts with file://)?
>> > >
>> > >
>> > > In your email you provide only exceptions printed by the client. What
>> > kind
>> > > of exceptions appear in the regionserver logs? Or appear in the
master
>> > log?
>> > >
>> > > If the logs are large your best bet is to pastebin them and then send
>> the
>> > > URL to the paste in your response.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
>> > > shankar.hiremath@huawei.com> wrote:
>> > >
>> > >
>> > > HBase file encryption some inconsistencies observed and data loss
>> > >
>> > > happens after running the hbck tool,
>> > >
>> > > the operation steps are as below.    (one thing what I observed is,
on
>> > >
>> > > startup of HMaster if it is not able to process the WAL file, then
>> > >
>> > > also it moved to /oldWALs)
>> > >
>> > >
>> > > Procedure:
>> > >
>> > > 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
>> > >
>> > > encryption and WAL file encryption as below, and perform 'table4-0'
>> > >
>> > > put operations (100 records added) <property>
>> > >
>> > > <name>hbase.crypto.keyprovider</name>
>> > >
>> > > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.keyprovider.parameters</name>
>> > >
>> > > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>> > >
>> > > </value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.master.key.name</name>
>> > >
>> > > <value>hdfs</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hfile.format.version</name>
>> > >
>> > > <value>3</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.hlog.reader.impl</name>
>> > >
>> > >
>> > >
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>> > >
>> > > r</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.hlog.writer.impl</name>
>> > >
>> > >
>> > >
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>> > >
>> > > r</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.wal.encryption</name>
>> > >
>> > > <value>true</value>
>> > >
>> > > </property>
>> > >
>> > > 3. Machine went down, so all process went down
>> > >
>> > >
>> > > 4. We disabled the WAL file encryption for performance reason, and
>> > >
>> > > keep encryption only for Hfile, as below <property>
>> > >
>> > > <name>hbase.crypto.keyprovider</name>
>> > >
>> > > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.keyprovider.parameters</name>
>> > >
>> > > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>> > >
>> > > </value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.master.key.name</name>
>> > >
>> > > <value>hdfs</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hfile.format.version</name>
>> > >
>> > > <value>3</value>
>> > >
>> > > </property>
>> > >
>> > > 5. Start the Region Server and query the 'table4-0' data
>> > >
>> > > hbase(main):003:0> count 'table4-0'
>> > >
>> > > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>> > >
>> > > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>> > >
>> > > online on
>> > >
>> > > XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>> > >
>> > > ame(HRegionServer.java:2685)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>> > >
>> > > rver.java:4119)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>> > >
>> > > java:3066)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>> > >
>> > > 2.callBlockingMethod(ClientProtos.java:29497)
>> > >
>> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>> > >
>> > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>> > >
>> > > cheduler.java:168)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>> > >
>> > > eduler.java:39)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>> > >
>> > > r.java:111) at java.lang.Thread.run(Thread.java:662)
>> > >
>> > > 6. Not able to read the data, so we decided to revert back the
>> > >
>> > > configuration (as original) 7. Kill/Stop the Region Server, revert
all
>> > >
>> > > the configurations as original, as below <property>
>> > >
>> > > <name>hbase.crypto.keyprovider</name>
>> > >
>> > > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.keyprovider.parameters</name>
>> > >
>> > > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>> > >
>> > > </value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.master.key.name</name>
>> > >
>> > > <value>hdfs</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hfile.format.version</name>
>> > >
>> > > <value>3</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.hlog.reader.impl</name>
>> > >
>> > >
>> > >
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>> > >
>> > > r</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.hlog.writer.impl</name>
>> > >
>> > >
>> > >
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>> > >
>> > > r</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.wal.encryption</name>
>> > >
>> > > <value>true</value>
>> > >
>> > > </property>
>> > >
>> > > 7. Start the Region Server, and perform the 'table4-0' query
>> > >
>> > > hbase(main):003:0> count 'table4-0'
>> > >
>> > > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>> > >
>> > > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>> > >
>> > > online on
>> > >
>> > > XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>> > >
>> > > ame(HRegionServer.java:2685)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>> > >
>> > > rver.java:4119)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>> > >
>> > > java:3066)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>> > >
>> > > 2.callBlockingMethod(ClientProtos.java:29497)
>> > >
>> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>> > >
>> > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>> > >
>> > > cheduler.java:168)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>> > >
>> > > eduler.java:39)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>> > >
>> > > r.java:111) at java.lang.Thread.run(Thread.java:662)
>> > >
>> > > 8. Run the hbase hbck to repair, as below ./hbase hbck -details
>> > >
>> > > .........................
>> > >
>> > > Summary:
>> > >
>> > > table1-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table2-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table3-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table4-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table5-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table6-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table7-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table8-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table9-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > hbase:meta is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > hbase:namespace is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > 22 inconsistencies detected.
>> > >
>> > > Status: INCONSISTENT
>> > >
>> > > 2014-07-24 19:13:05,532 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing master
>> > >
>> > > protocol: MasterService
>> > >
>> > > 2014-07-24 19:13:05,533 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing
zookeeper
>> > >
>> > > sessionid=0x1475d1611611bcf
>> > >
>> > > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
>> > session:
>> > >
>> > > 0x1475d1611611bcf
>> > >
>> > > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
>> > >
>> > > client for session: 0x1475d1611611bcf
>> > >
>> > > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
>> packet::
>> > >
>> > > clientPath:null serverPath:null finished:false header:: 6,-11
>> > replyHeader::
>> > >
>> > > 6,4295102074,0 request:: null response:: null
>> > >
>> > > 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
>> > >
>> > > Disconnecting client for session: 0x1475d1611611bcf
>> > >
>> > > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: An exception was thrown while closing send
>> > >
>> > > thread for session 0x1475d1611611bcf : Unable to read additional data
>> > >
>> > > from server sessionid 0x1475d1611611bcf, likely server has closed
>> > >
>> > > socket
>> > >
>> > > 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
>> > >
>> > > EventThread shut down
>> > >
>> > > 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
>> > >
>> > > 0x1475d1611611bcf closed
>> > >
>> > > shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
>> > >
>> > > 9. Fix the assignments as below
>> > >
>> > > ./hbase hbck -fixAssignments
>> > >
>> > > Summary:
>> > >
>> > > table1-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table2-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table3-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table4-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table5-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table6-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table7-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table8-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table9-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > 0 inconsistencies detected.
>> > >
>> > > Status: OK
>> > >
>> > > 2014-07-24 19:44:55,194 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing master
>> > >
>> > > protocol: MasterService
>> > >
>> > > 2014-07-24 19:44:55,194 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing
zookeeper
>> > >
>> > > sessionid=0x2475d15f7b31b73
>> > >
>> > > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
>> > session:
>> > >
>> > > 0x2475d15f7b31b73
>> > >
>> > > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
>> > >
>> > > client for session: 0x2475d15f7b31b73
>> > >
>> > > 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
>> packet::
>> > >
>> > > clientPath:null serverPath:null finished:false header:: 7,-11
>> > replyHeader::
>> > >
>> > > 7,4295102377,0 request:: null response:: null
>> > >
>> > > 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
>> > >
>> > > Disconnecting client for session: 0x2475d15f7b31b73
>> > >
>> > > 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: An exception was thrown while closing send
>> > >
>> > > thread for session 0x2475d15f7b31b73 : Unable to read additional data
>> > >
>> > > from server sessionid 0x2475d15f7b31b73, likely server has closed
>> > >
>> > > socket
>> > >
>> > > 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
>> > >
>> > > 0x2475d15f7b31b73 closed
>> > >
>> > > 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
>> > >
>> > > EventThread shut down
>> > >
>> > > 10. Fix the assignments as below
>> > >
>> > > ./hbase hbck -fixAssignments -fixMeta
>> > >
>> > > Summary:
>> > >
>> > > table1-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table2-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table3-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table4-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table5-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table6-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table7-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table8-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table9-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > 0 inconsistencies detected.
>> > >
>> > > Status: OK
>> > >
>> > > 2014-07-24 19:46:16,290 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing master
>> > >
>> > > protocol: MasterService
>> > >
>> > > 2014-07-24 19:46:16,290 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing
zookeeper
>> > >
>> > > sessionid=0x3475d1605321be9
>> > >
>> > > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
>> > session:
>> > >
>> > > 0x3475d1605321be9
>> > >
>> > > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
>> > >
>> > > client for session: 0x3475d1605321be9
>> > >
>> > > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
>> packet::
>> > >
>> > > clientPath:null serverPath:null finished:false header:: 6,-11
>> > replyHeader::
>> > >
>> > > 6,4295102397,0 request:: null response:: null
>> > >
>> > > 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
>> > >
>> > > Disconnecting client for session: 0x3475d1605321be9
>> > >
>> > > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: An exception was thrown while closing send
>> > >
>> > > thread for session 0x3475d1605321be9 : Unable to read additional data
>> > >
>> > > from server sessionid 0x3475d1605321be9, likely server has closed
>> > >
>> > > socket
>> > >
>> > > 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
>> > >
>> > > 0x3475d1605321be9 closed
>> > >
>> > > 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
>> > >
>> > > EventThread shut down
>> > >
>> > > hbase(main):006:0> count 'table4-0'
>> > >
>> > > 0 row(s) in 0.0200 seconds
>> > >
>> > > => 0
>> > >
>> > > hbase(main):007:0>
>> > >
>> > > Complete data loss happened,
>> > >
>> > > WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>> > >
>> > >
>> > >
>> > >
>> > > [X]
>> > >
>> > > This e-mail and its attachments contain confidential information from
>> > >
>> > > HUAWEI, which is intended only for the person or entity whose address
>> > >
>> > > is listed above. Any use of the information contained herein in any
>> > >
>> > > way (including, but not limited to, total or partial disclosure,
>> > >
>> > > reproduction, or dissemination) by persons other than the intended
>> > >
>> > > recipient(s) is prohibited. If you receive this e-mail in error,
>> > >
>> > > please notify the sender by phone or email immediately and delete it!
>> > >
>> > > [X]
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Best regards,
>> > >
>> > >
>> > >  - Andy
>> > >
>> > >
>> > > Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>> > > (via Tom White)
>> > >
>> > >
>> >
>>
>

Re: HBase file encryption, inconsistencies observed and data loss

Posted by ramkrishna vasudevan <ra...@gmail.com>.

I think in the above case though encryption is disabled we will need to use
the securelogreader  only for the new files also that will be created? I
don have code with me now. But if that is the case need to see it as I feel
only the existing one should be read with securelogreader. The new wal
should be read using log reader.
Moving to corrupt folder is fine unless we could bring it back to the main
working for.
Sent from mobile excuse any typos.
On Jul 27, 2014 10:07 AM, "Anoop John" <an...@gmail.com> wrote:

> As per Shankar he can get things work with below configs
>
> <property>
>         <name>hbase.regionserver.hlog.reader.impl</name>
>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
>         <name>hbase.regionserver.hlog.writer.impl</name>
>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
>         <name>hbase.regionserver.wal.encryption</name>
>         <value>false</value>
> </property>
>
> Once the RS crash happened,  the config is maintained above way. See that
> WAL encryption is disabled now.  Still note that the reader is
> SecureProtobufLogReader. The existing WAL files are with encryption and
> only SecureProtobufLogReader can read them.  So if that is not configured,
> the default reader is. ProtobufLogReader  can not read them back
> correctly.    So this is the issue that Shankar faced.
>
> Also when the file can not be read, this is not moved under corrupt logs is
> a concerning thing.  Need to look at that.
>
> -Anoop-
>
> On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <andrew.purtell@gmail.com
> >
> wrote:
>
> > My attempt to reproduce this issue:
> >
> > 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a dev
> > box.
> >
> > 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver also
> on
> > this dev box.
> >
> > 3. Set dfs.replication and
> hbase.regionserver.hlog.tolerable.lowreplication
> > to 1. Set up a keystore and enabled WAL encryption.
> >
> > 4. Created a test table.
> >
> > 5. Used YCSB to write 1000 rows to the test table. No flushes observed.
> >
> > 6. Used the shell to count the number of records in the test table.
> Count =
> > 1000 rows
> >
> > 7. kill -9 the regionserver process.
> >
> > 8. Started a new regionserver process. Observed log splitting and replay
> in
> > the regionserver log, no errors.
> >
> > 9. Used the shell to count the number of records in the test table.
> Count =
> > 1000 rows
> >
> > Tried this a few times.
> >
> > Shankar, can you try running through the above and let us know if the
> > outcome is different?
> >
> >
> >
> > On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <
> andrew.purtell@gmail.com>
> > wrote:
> >
> > > Thanks for the detail. So to summarize:
> > >
> > > 0. HBase 0.98.3 and HDFS 2.4.1
> > >
> > > 1. All data before failure has not yet been flushed so only exists in
> the
> > > WAL files.
> > >
> > > 2. During distributed splitting, the WAL has either not been written
> out
> > > or is unreadable:
> > >
> > >
> > > 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> > > codec.BaseDecoder: Partial cell read caused by EOF:
> java.io.IOException:
> > > Premature EOF from inputStream
> > >
> > >
> > > 3. This file is still moved to oldWALs even though splitting failed.
> > >
> > > 4. Setting 'hbase.regionserver.wal.encryption' to false allows for data
> > > recovery in your scenario.
> > >
> > > See https://issues.apache.org/jira/browse/HBASE-11595
> > >
> > >
> > >
> > >
> > > On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
> > shankar.hiremath@huawei.com>
> > > wrote:
> > >
> > >
> > > Hi Andrew,
> > >
> > >
> > > Please find the details
> > >
> > >
> > > Hbase 0.98.3 & hadoop 2.4.1
> > >
> > > Hbase root file system on hdfs
> > >
> > >
> > > On Hmaster side there is no failure or error message in the log file
> > >
> > > On Region Server side the the below error message reported as below
> > >
> > >
> > > Region Server Log:
> > >
> > > 2014-07-26 19:29:15,904 DEBUG
> [regionserver60020-SendThread(host2:2181)]
> > > zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c,
> packet::
> > > clientPath:null serverPath:null finished:false header:: 172,4
> > >  replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F
> > >  response::
> > >
> >
> #ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
> > >
> > > 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
> > >
> > > 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
> > > wal.HLogSplitter: Writer thread
> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
> > >
> > >
> > > 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> > > codec.BaseDecoder: Partial cell read caused by EOF:
> java.io.IOException:
> > > Premature EOF from inputStream
> > >
> > >
> > > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> > > wal.HLogSplitter: Finishing writing output logs and closing down.
> > >
> > > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> > > wal.HLogSplitter: Waiting for split writer threads to finish
> > >
> > > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> > > wal.HLogSplitter: Split writers finished
> > >
> > > 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> > > wal.HLogSplitter: Processed 0 edits across 0 regions; log
> > >
> >
> file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta
> > > is corrupted = false progress failed = false
> > >
> > > 2014-07-26 19:29:16,184 DEBUG
> [regionserver60020-SendThread(host2:2181)]
> > > zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
> > >
> > >
> > >
> > > When I query the table data, which was in WAL files(before the
> > > RegionServer machine went down) is not coming,
> > >
> > > One more thing what I observed is even when the WAL file not
> successfully
> > > processed then also it is moving to /oldWALs folder.
> > >
> > > So when I revert back the below 3 configuration in Region Server side
> and
> > > restart, since the WAL is already moved to oldWALS/ folder,
> > >
> > > So it will not get processed.
> > >
> > >
> > > <property>
> > >
> > >    <name>hbase.regionserver.hlog.reader.impl</name>
> > >
> > >
> > >
> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > >   <name>hbase.regionserver.hlog.writer.impl</name>
> > >
> > >
> > >
> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > >   <name>hbase.regionserver.wal.encryption</name>
> > >
> > >   <value>true</value>
> > >
> > > </property>
> > >
> > >
> > >
> > >
> >
> -------------------------------------------------------------------------------------------------------------
> > >
> > >
> > > And one more scenario I tried (Anoop suggested), with the below
> > > configuration (instead of deleting the below 3 config paramters
> > >
> > > Kepp all but make only 'hbase.regionserver.wal.encryption=false') the
> > > encrypted wal file is getting processed
> > >
> > > Successfully, and the query table is giving the WAL data (before the
> > > RegionServer machine went down) correctly.
> > >
> > >
> > > <property>
> > >
> > >   <name>hbase.regionserver.hlog.reader.impl</name>
> > >
> > >
> > >
> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > >   <name>hbase.regionserver.hlog.writer.impl</name>
> > >
> > >
> > >
> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > >   <name>hbase.regionserver.wal.encryption</name>
> > >
> > >   <value>false</value>
> > >
> > > </property>
> > >
> > >
> > >
> > > Regards
> > >
> > > -Shankar
> > >
> > >
> > > This e-mail and its attachments contain confidential information from
> > > HUAWEI, which is intended only for the person or entity whose address
> is
> > > listed above. Any use of the information contained herein in any way
> > > (including, but not limited to, total or partial disclosure,
> > reproduction,
> > > or dissemination) by persons other than the intended recipient(s) is
> > > prohibited. If you receive this e-mail in error, please notify the
> sender
> > > by phone or email immediately and delete it!
> > >
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > >
> > > From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
> > > <an...@gmail.com>] On Behalf Of Andrew Purtell
> > >
> > > Sent: 26 July 2014 AM 02:21
> > >
> > > To: user@hbase.apache.org
> > >
> > > Subject: Re: HBase file encryption, inconsistencies observed and data
> > loss
> > >
> > >
> > > Encryption (or the lack of it) doesn't explain missing HFiles.
> > >
> > >
> > > Most likely if you are having a problem with encryption, this will
> > > manifest as follows: HFiles will be present. However, you will find
> many
> > > IOExceptions in the regionserver logs as they attempt to open the
> HFiles
> > > but fail because the data is unreadable.
> > >
> > >
> > > We should start by looking at more basic issues. What could explain the
> > > total disappearance of HFiles.
> > >
> > >
> > > Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on
> > > the local filesystem (fs URL starts with file://)?
> > >
> > >
> > > In your email you provide only exceptions printed by the client. What
> > kind
> > > of exceptions appear in the regionserver logs? Or appear in the master
> > log?
> > >
> > > If the logs are large your best bet is to pastebin them and then send
> the
> > > URL to the paste in your response.
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
> > > shankar.hiremath@huawei.com> wrote:
> > >
> > >
> > > HBase file encryption some inconsistencies observed and data loss
> > >
> > > happens after running the hbck tool,
> > >
> > > the operation steps are as below.    (one thing what I observed is, on
> > >
> > > startup of HMaster if it is not able to process the WAL file, then
> > >
> > > also it moved to /oldWALs)
> > >
> > >
> > > Procedure:
> > >
> > > 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
> > >
> > > encryption and WAL file encryption as below, and perform 'table4-0'
> > >
> > > put operations (100 records added) <property>
> > >
> > > <name>hbase.crypto.keyprovider</name>
> > >
> > > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.crypto.keyprovider.parameters</name>
> > >
> > > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> > >
> > > </value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.crypto.master.key.name</name>
> > >
> > > <value>hdfs</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hfile.format.version</name>
> > >
> > > <value>3</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.regionserver.hlog.reader.impl</name>
> > >
> > >
> > > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
> > >
> > > r</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.regionserver.hlog.writer.impl</name>
> > >
> > >
> > > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
> > >
> > > r</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.regionserver.wal.encryption</name>
> > >
> > > <value>true</value>
> > >
> > > </property>
> > >
> > > 3. Machine went down, so all process went down
> > >
> > >
> > > 4. We disabled the WAL file encryption for performance reason, and
> > >
> > > keep encryption only for Hfile, as below <property>
> > >
> > > <name>hbase.crypto.keyprovider</name>
> > >
> > > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.crypto.keyprovider.parameters</name>
> > >
> > > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> > >
> > > </value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.crypto.master.key.name</name>
> > >
> > > <value>hdfs</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hfile.format.version</name>
> > >
> > > <value>3</value>
> > >
> > > </property>
> > >
> > > 5. Start the Region Server and query the 'table4-0' data
> > >
> > > hbase(main):003:0> count 'table4-0'
> > >
> > > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> > >
> > > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> > >
> > > online on
> > >
> > > XX-XX-XX-XX,60020,1406209023146
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
> > >
> > > ame(HRegionServer.java:2685)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
> > >
> > > rver.java:4119)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> > >
> > > java:3066)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> > >
> > > 2.callBlockingMethod(ClientProtos.java:29497)
> > >
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> > >
> > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
> > >
> > > cheduler.java:168)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
> > >
> > > eduler.java:39)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
> > >
> > > r.java:111) at java.lang.Thread.run(Thread.java:662)
> > >
> > > 6. Not able to read the data, so we decided to revert back the
> > >
> > > configuration (as original) 7. Kill/Stop the Region Server, revert all
> > >
> > > the configurations as original, as below <property>
> > >
> > > <name>hbase.crypto.keyprovider</name>
> > >
> > > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.crypto.keyprovider.parameters</name>
> > >
> > > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> > >
> > > </value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.crypto.master.key.name</name>
> > >
> > > <value>hdfs</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hfile.format.version</name>
> > >
> > > <value>3</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.regionserver.hlog.reader.impl</name>
> > >
> > >
> > > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
> > >
> > > r</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.regionserver.hlog.writer.impl</name>
> > >
> > >
> > > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
> > >
> > > r</value>
> > >
> > > </property>
> > >
> > > <property>
> > >
> > > <name>hbase.regionserver.wal.encryption</name>
> > >
> > > <value>true</value>
> > >
> > > </property>
> > >
> > > 7. Start the Region Server, and perform the 'table4-0' query
> > >
> > > hbase(main):003:0> count 'table4-0'
> > >
> > > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> > >
> > > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> > >
> > > online on
> > >
> > > XX-XX-XX-XX,60020,1406209023146
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
> > >
> > > ame(HRegionServer.java:2685)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
> > >
> > > rver.java:4119)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> > >
> > > java:3066)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> > >
> > > 2.callBlockingMethod(ClientProtos.java:29497)
> > >
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> > >
> > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
> > >
> > > cheduler.java:168)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
> > >
> > > eduler.java:39)
> > >
> > > at
> > >
> > > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
> > >
> > > r.java:111) at java.lang.Thread.run(Thread.java:662)
> > >
> > > 8. Run the hbase hbck to repair, as below ./hbase hbck -details
> > >
> > > .........................
> > >
> > > Summary:
> > >
> > > table1-0 is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > table2-0 is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > table3-0 is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > table4-0 is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > table5-0 is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > table6-0 is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > table7-0 is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > table8-0 is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > table9-0 is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > hbase:meta is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > hbase:namespace is okay.
> > >
> > > Number of regions: 0
> > >
> > > Deployed on:
> > >
> > > 22 inconsistencies detected.
> > >
> > > Status: INCONSISTENT
> > >
> > > 2014-07-24 19:13:05,532 INFO [main]
> > >
> > > client.HConnectionManager$HConnectionImplementation: Closing master
> > >
> > > protocol: MasterService
> > >
> > > 2014-07-24 19:13:05,533 INFO [main]
> > >
> > > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> > >
> > > sessionid=0x1475d1611611bcf
> > >
> > > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
> > session:
> > >
> > > 0x1475d1611611bcf
> > >
> > > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
> > >
> > > client for session: 0x1475d1611611bcf
> > >
> > > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >
> > > zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
> packet::
> > >
> > > clientPath:null serverPath:null finished:false header:: 6,-11
> > replyHeader::
> > >
> > > 6,4295102074,0 request:: null response:: null
> > >
> > > 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
> > >
> > > Disconnecting client for session: 0x1475d1611611bcf
> > >
> > > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >
> > > zookeeper.ClientCnxn: An exception was thrown while closing send
> > >
> > > thread for session 0x1475d1611611bcf : Unable to read additional data
> > >
> > > from server sessionid 0x1475d1611611bcf, likely server has closed
> > >
> > > socket
> > >
> > > 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
> > >
> > > EventThread shut down
> > >
> > > 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> > >
> > > 0x1475d1611611bcf closed
> > >
> > > shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> > >
> > > 9. Fix the assignments as below
> > >
> > > ./hbase hbck -fixAssignments
> > >
> > > Summary:
> > >
> > > table1-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table2-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table3-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table4-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table5-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table6-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table7-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table8-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table9-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > 0 inconsistencies detected.
> > >
> > > Status: OK
> > >
> > > 2014-07-24 19:44:55,194 INFO [main]
> > >
> > > client.HConnectionManager$HConnectionImplementation: Closing master
> > >
> > > protocol: MasterService
> > >
> > > 2014-07-24 19:44:55,194 INFO [main]
> > >
> > > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> > >
> > > sessionid=0x2475d15f7b31b73
> > >
> > > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
> > session:
> > >
> > > 0x2475d15f7b31b73
> > >
> > > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
> > >
> > > client for session: 0x2475d15f7b31b73
> > >
> > > 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >
> > > zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
> packet::
> > >
> > > clientPath:null serverPath:null finished:false header:: 7,-11
> > replyHeader::
> > >
> > > 7,4295102377,0 request:: null response:: null
> > >
> > > 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
> > >
> > > Disconnecting client for session: 0x2475d15f7b31b73
> > >
> > > 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >
> > > zookeeper.ClientCnxn: An exception was thrown while closing send
> > >
> > > thread for session 0x2475d15f7b31b73 : Unable to read additional data
> > >
> > > from server sessionid 0x2475d15f7b31b73, likely server has closed
> > >
> > > socket
> > >
> > > 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> > >
> > > 0x2475d15f7b31b73 closed
> > >
> > > 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
> > >
> > > EventThread shut down
> > >
> > > 10. Fix the assignments as below
> > >
> > > ./hbase hbck -fixAssignments -fixMeta
> > >
> > > Summary:
> > >
> > > table1-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table2-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table3-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table4-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table5-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table6-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table7-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table8-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > table9-0 is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> > >
> > > Number of regions: 1
> > >
> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
> > >
> > > 0 inconsistencies detected.
> > >
> > > Status: OK
> > >
> > > 2014-07-24 19:46:16,290 INFO [main]
> > >
> > > client.HConnectionManager$HConnectionImplementation: Closing master
> > >
> > > protocol: MasterService
> > >
> > > 2014-07-24 19:46:16,290 INFO [main]
> > >
> > > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> > >
> > > sessionid=0x3475d1605321be9
> > >
> > > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
> > session:
> > >
> > > 0x3475d1605321be9
> > >
> > > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
> > >
> > > client for session: 0x3475d1605321be9
> > >
> > > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >
> > > zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
> packet::
> > >
> > > clientPath:null serverPath:null finished:false header:: 6,-11
> > replyHeader::
> > >
> > > 6,4295102397,0 request:: null response:: null
> > >
> > > 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
> > >
> > > Disconnecting client for session: 0x3475d1605321be9
> > >
> > > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> > >
> > > zookeeper.ClientCnxn: An exception was thrown while closing send
> > >
> > > thread for session 0x3475d1605321be9 : Unable to read additional data
> > >
> > > from server sessionid 0x3475d1605321be9, likely server has closed
> > >
> > > socket
> > >
> > > 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> > >
> > > 0x3475d1605321be9 closed
> > >
> > > 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
> > >
> > > EventThread shut down
> > >
> > > hbase(main):006:0> count 'table4-0'
> > >
> > > 0 row(s) in 0.0200 seconds
> > >
> > > => 0
> > >
> > > hbase(main):007:0>
> > >
> > > Complete data loss happened,
> > >
> > > WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
> > >
> > >
> > >
> > >
> > > [X]
> > >
> > > This e-mail and its attachments contain confidential information from
> > >
> > > HUAWEI, which is intended only for the person or entity whose address
> > >
> > > is listed above. Any use of the information contained herein in any
> > >
> > > way (including, but not limited to, total or partial disclosure,
> > >
> > > reproduction, or dissemination) by persons other than the intended
> > >
> > > recipient(s) is prohibited. If you receive this e-mail in error,
> > >
> > > please notify the sender by phone or email immediately and delete it!
> > >
> > > [X]
> > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > >
> > >
> > >  - Andy
> > >
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> > >
> >
>

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Anoop John <an...@gmail.com>.

As per Shankar he can get things work with below configs

<property>
        <name>hbase.regionserver.hlog.reader.impl</name>

<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
        <name>hbase.regionserver.hlog.writer.impl</name>

<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
        <name>hbase.regionserver.wal.encryption</name>
        <value>false</value>
</property>

Once the RS crash happened,  the config is maintained above way. See that
WAL encryption is disabled now.  Still note that the reader is
SecureProtobufLogReader. The existing WAL files are with encryption and
only SecureProtobufLogReader can read them.  So if that is not configured,
the default reader is. ProtobufLogReader  can not read them back
correctly.    So this is the issue that Shankar faced.

Also when the file can not be read, this is not moved under corrupt logs is
a concerning thing.  Need to look at that.

-Anoop-

On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <an...@gmail.com>
wrote:

> My attempt to reproduce this issue:
>
> 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a dev
> box.
>
> 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver also on
> this dev box.
>
> 3. Set dfs.replication and hbase.regionserver.hlog.tolerable.lowreplication
> to 1. Set up a keystore and enabled WAL encryption.
>
> 4. Created a test table.
>
> 5. Used YCSB to write 1000 rows to the test table. No flushes observed.
>
> 6. Used the shell to count the number of records in the test table. Count =
> 1000 rows
>
> 7. kill -9 the regionserver process.
>
> 8. Started a new regionserver process. Observed log splitting and replay in
> the regionserver log, no errors.
>
> 9. Used the shell to count the number of records in the test table. Count =
> 1000 rows
>
> Tried this a few times.
>
> Shankar, can you try running through the above and let us know if the
> outcome is different?
>
>
>
> On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <an...@gmail.com>
> wrote:
>
> > Thanks for the detail. So to summarize:
> >
> > 0. HBase 0.98.3 and HDFS 2.4.1
> >
> > 1. All data before failure has not yet been flushed so only exists in the
> > WAL files.
> >
> > 2. During distributed splitting, the WAL has either not been written out
> > or is unreadable:
> >
> >
> > 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> > codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
> > Premature EOF from inputStream
> >
> >
> > 3. This file is still moved to oldWALs even though splitting failed.
> >
> > 4. Setting 'hbase.regionserver.wal.encryption' to false allows for data
> > recovery in your scenario.
> >
> > See https://issues.apache.org/jira/browse/HBASE-11595
> >
> >
> >
> >
> > On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
> shankar.hiremath@huawei.com>
> > wrote:
> >
> >
> > Hi Andrew,
> >
> >
> > Please find the details
> >
> >
> > Hbase 0.98.3 & hadoop 2.4.1
> >
> > Hbase root file system on hdfs
> >
> >
> > On Hmaster side there is no failure or error message in the log file
> >
> > On Region Server side the the below error message reported as below
> >
> >
> > Region Server Log:
> >
> > 2014-07-26 19:29:15,904 DEBUG [regionserver60020-SendThread(host2:2181)]
> > zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c, packet::
> > clientPath:null serverPath:null finished:false header:: 172,4
> >  replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F
> >  response::
> >
> #ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
> >
> > 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
> >
> > 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
> >
> > 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
> >
> > 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
> >
> > 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
> >
> > 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
> >
> > 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
> >
> > 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
> >
> > 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
> >
> > 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
> >
> > 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
> >
> > 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
> >
> > 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
> >
> > 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
> >
> > 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
> >
> > 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
> > wal.HLogSplitter: Writer thread
> > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
> >
> >
> > 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> > codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
> > Premature EOF from inputStream
> >
> >
> > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> > wal.HLogSplitter: Finishing writing output logs and closing down.
> >
> > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> > wal.HLogSplitter: Waiting for split writer threads to finish
> >
> > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> > wal.HLogSplitter: Split writers finished
> >
> > 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> > wal.HLogSplitter: Processed 0 edits across 0 regions; log
> >
> file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta
> > is corrupted = false progress failed = false
> >
> > 2014-07-26 19:29:16,184 DEBUG [regionserver60020-SendThread(host2:2181)]
> > zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
> >
> >
> >
> > When I query the table data, which was in WAL files(before the
> > RegionServer machine went down) is not coming,
> >
> > One more thing what I observed is even when the WAL file not successfully
> > processed then also it is moving to /oldWALs folder.
> >
> > So when I revert back the below 3 configuration in Region Server side and
> > restart, since the WAL is already moved to oldWALS/ folder,
> >
> > So it will not get processed.
> >
> >
> > <property>
> >
> >    <name>hbase.regionserver.hlog.reader.impl</name>
> >
> >
> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> >
> > </property>
> >
> > <property>
> >
> >   <name>hbase.regionserver.hlog.writer.impl</name>
> >
> >
> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> >
> > </property>
> >
> > <property>
> >
> >   <name>hbase.regionserver.wal.encryption</name>
> >
> >   <value>true</value>
> >
> > </property>
> >
> >
> >
> >
> -------------------------------------------------------------------------------------------------------------
> >
> >
> > And one more scenario I tried (Anoop suggested), with the below
> > configuration (instead of deleting the below 3 config paramters
> >
> > Kepp all but make only 'hbase.regionserver.wal.encryption=false') the
> > encrypted wal file is getting processed
> >
> > Successfully, and the query table is giving the WAL data (before the
> > RegionServer machine went down) correctly.
> >
> >
> > <property>
> >
> >   <name>hbase.regionserver.hlog.reader.impl</name>
> >
> >
> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> >
> > </property>
> >
> > <property>
> >
> >   <name>hbase.regionserver.hlog.writer.impl</name>
> >
> >
> >
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> >
> > </property>
> >
> > <property>
> >
> >   <name>hbase.regionserver.wal.encryption</name>
> >
> >   <value>false</value>
> >
> > </property>
> >
> >
> >
> > Regards
> >
> > -Shankar
> >
> >
> > This e-mail and its attachments contain confidential information from
> > HUAWEI, which is intended only for the person or entity whose address is
> > listed above. Any use of the information contained herein in any way
> > (including, but not limited to, total or partial disclosure,
> reproduction,
> > or dissemination) by persons other than the intended recipient(s) is
> > prohibited. If you receive this e-mail in error, please notify the sender
> > by phone or email immediately and delete it!
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> >
> > From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
> > <an...@gmail.com>] On Behalf Of Andrew Purtell
> >
> > Sent: 26 July 2014 AM 02:21
> >
> > To: user@hbase.apache.org
> >
> > Subject: Re: HBase file encryption, inconsistencies observed and data
> loss
> >
> >
> > Encryption (or the lack of it) doesn't explain missing HFiles.
> >
> >
> > Most likely if you are having a problem with encryption, this will
> > manifest as follows: HFiles will be present. However, you will find many
> > IOExceptions in the regionserver logs as they attempt to open the HFiles
> > but fail because the data is unreadable.
> >
> >
> > We should start by looking at more basic issues. What could explain the
> > total disappearance of HFiles.
> >
> >
> > Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on
> > the local filesystem (fs URL starts with file://)?
> >
> >
> > In your email you provide only exceptions printed by the client. What
> kind
> > of exceptions appear in the regionserver logs? Or appear in the master
> log?
> >
> > If the logs are large your best bet is to pastebin them and then send the
> > URL to the paste in your response.
> >
> >
> >
> >
> >
> > On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
> > shankar.hiremath@huawei.com> wrote:
> >
> >
> > HBase file encryption some inconsistencies observed and data loss
> >
> > happens after running the hbck tool,
> >
> > the operation steps are as below.    (one thing what I observed is, on
> >
> > startup of HMaster if it is not able to process the WAL file, then
> >
> > also it moved to /oldWALs)
> >
> >
> > Procedure:
> >
> > 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
> >
> > encryption and WAL file encryption as below, and perform 'table4-0'
> >
> > put operations (100 records added) <property>
> >
> > <name>hbase.crypto.keyprovider</name>
> >
> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.crypto.keyprovider.parameters</name>
> >
> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> >
> > </value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.crypto.master.key.name</name>
> >
> > <value>hdfs</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hfile.format.version</name>
> >
> > <value>3</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.regionserver.hlog.reader.impl</name>
> >
> >
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
> >
> > r</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.regionserver.hlog.writer.impl</name>
> >
> >
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
> >
> > r</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.regionserver.wal.encryption</name>
> >
> > <value>true</value>
> >
> > </property>
> >
> > 3. Machine went down, so all process went down
> >
> >
> > 4. We disabled the WAL file encryption for performance reason, and
> >
> > keep encryption only for Hfile, as below <property>
> >
> > <name>hbase.crypto.keyprovider</name>
> >
> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.crypto.keyprovider.parameters</name>
> >
> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> >
> > </value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.crypto.master.key.name</name>
> >
> > <value>hdfs</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hfile.format.version</name>
> >
> > <value>3</value>
> >
> > </property>
> >
> > 5. Start the Region Server and query the 'table4-0' data
> >
> > hbase(main):003:0> count 'table4-0'
> >
> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> >
> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> >
> > online on
> >
> > XX-XX-XX-XX,60020,1406209023146
> >
> > at
> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
> >
> > ame(HRegionServer.java:2685)
> >
> > at
> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
> >
> > rver.java:4119)
> >
> > at
> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> >
> > java:3066)
> >
> > at
> >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> >
> > 2.callBlockingMethod(ClientProtos.java:29497)
> >
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> >
> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> >
> > at
> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
> >
> > cheduler.java:168)
> >
> > at
> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
> >
> > eduler.java:39)
> >
> > at
> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
> >
> > r.java:111) at java.lang.Thread.run(Thread.java:662)
> >
> > 6. Not able to read the data, so we decided to revert back the
> >
> > configuration (as original) 7. Kill/Stop the Region Server, revert all
> >
> > the configurations as original, as below <property>
> >
> > <name>hbase.crypto.keyprovider</name>
> >
> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.crypto.keyprovider.parameters</name>
> >
> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> >
> > </value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.crypto.master.key.name</name>
> >
> > <value>hdfs</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hfile.format.version</name>
> >
> > <value>3</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.regionserver.hlog.reader.impl</name>
> >
> >
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
> >
> > r</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.regionserver.hlog.writer.impl</name>
> >
> >
> > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
> >
> > r</value>
> >
> > </property>
> >
> > <property>
> >
> > <name>hbase.regionserver.wal.encryption</name>
> >
> > <value>true</value>
> >
> > </property>
> >
> > 7. Start the Region Server, and perform the 'table4-0' query
> >
> > hbase(main):003:0> count 'table4-0'
> >
> > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> >
> > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> >
> > online on
> >
> > XX-XX-XX-XX,60020,1406209023146
> >
> > at
> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
> >
> > ame(HRegionServer.java:2685)
> >
> > at
> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
> >
> > rver.java:4119)
> >
> > at
> >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> >
> > java:3066)
> >
> > at
> >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> >
> > 2.callBlockingMethod(ClientProtos.java:29497)
> >
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> >
> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> >
> > at
> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
> >
> > cheduler.java:168)
> >
> > at
> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
> >
> > eduler.java:39)
> >
> > at
> >
> > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
> >
> > r.java:111) at java.lang.Thread.run(Thread.java:662)
> >
> > 8. Run the hbase hbck to repair, as below ./hbase hbck -details
> >
> > .........................
> >
> > Summary:
> >
> > table1-0 is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > table2-0 is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > table3-0 is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > table4-0 is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > table5-0 is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > table6-0 is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > table7-0 is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > table8-0 is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > table9-0 is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > hbase:meta is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > hbase:namespace is okay.
> >
> > Number of regions: 0
> >
> > Deployed on:
> >
> > 22 inconsistencies detected.
> >
> > Status: INCONSISTENT
> >
> > 2014-07-24 19:13:05,532 INFO [main]
> >
> > client.HConnectionManager$HConnectionImplementation: Closing master
> >
> > protocol: MasterService
> >
> > 2014-07-24 19:13:05,533 INFO [main]
> >
> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> >
> > sessionid=0x1475d1611611bcf
> >
> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
> session:
> >
> > 0x1475d1611611bcf
> >
> > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
> >
> > client for session: 0x1475d1611611bcf
> >
> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >
> > zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf, packet::
> >
> > clientPath:null serverPath:null finished:false header:: 6,-11
> replyHeader::
> >
> > 6,4295102074,0 request:: null response:: null
> >
> > 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
> >
> > Disconnecting client for session: 0x1475d1611611bcf
> >
> > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >
> > zookeeper.ClientCnxn: An exception was thrown while closing send
> >
> > thread for session 0x1475d1611611bcf : Unable to read additional data
> >
> > from server sessionid 0x1475d1611611bcf, likely server has closed
> >
> > socket
> >
> > 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
> >
> > EventThread shut down
> >
> > 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> >
> > 0x1475d1611611bcf closed
> >
> > shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> >
> > 9. Fix the assignments as below
> >
> > ./hbase hbck -fixAssignments
> >
> > Summary:
> >
> > table1-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table2-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table3-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table4-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table5-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table6-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table7-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table8-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table9-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > 0 inconsistencies detected.
> >
> > Status: OK
> >
> > 2014-07-24 19:44:55,194 INFO [main]
> >
> > client.HConnectionManager$HConnectionImplementation: Closing master
> >
> > protocol: MasterService
> >
> > 2014-07-24 19:44:55,194 INFO [main]
> >
> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> >
> > sessionid=0x2475d15f7b31b73
> >
> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
> session:
> >
> > 0x2475d15f7b31b73
> >
> > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
> >
> > client for session: 0x2475d15f7b31b73
> >
> > 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >
> > zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73, packet::
> >
> > clientPath:null serverPath:null finished:false header:: 7,-11
> replyHeader::
> >
> > 7,4295102377,0 request:: null response:: null
> >
> > 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
> >
> > Disconnecting client for session: 0x2475d15f7b31b73
> >
> > 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >
> > zookeeper.ClientCnxn: An exception was thrown while closing send
> >
> > thread for session 0x2475d15f7b31b73 : Unable to read additional data
> >
> > from server sessionid 0x2475d15f7b31b73, likely server has closed
> >
> > socket
> >
> > 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> >
> > 0x2475d15f7b31b73 closed
> >
> > 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
> >
> > EventThread shut down
> >
> > 10. Fix the assignments as below
> >
> > ./hbase hbck -fixAssignments -fixMeta
> >
> > Summary:
> >
> > table1-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table2-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table3-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table4-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table5-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table6-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table7-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table8-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > table9-0 is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> >
> > Number of regions: 1
> >
> > Deployed on: XX-XX-XX-XX,60020,1406209023146
> >
> > 0 inconsistencies detected.
> >
> > Status: OK
> >
> > 2014-07-24 19:46:16,290 INFO [main]
> >
> > client.HConnectionManager$HConnectionImplementation: Closing master
> >
> > protocol: MasterService
> >
> > 2014-07-24 19:46:16,290 INFO [main]
> >
> > client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> >
> > sessionid=0x3475d1605321be9
> >
> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
> session:
> >
> > 0x3475d1605321be9
> >
> > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
> >
> > client for session: 0x3475d1605321be9
> >
> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >
> > zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9, packet::
> >
> > clientPath:null serverPath:null finished:false header:: 6,-11
> replyHeader::
> >
> > 6,4295102397,0 request:: null response:: null
> >
> > 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
> >
> > Disconnecting client for session: 0x3475d1605321be9
> >
> > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >
> > zookeeper.ClientCnxn: An exception was thrown while closing send
> >
> > thread for session 0x3475d1605321be9 : Unable to read additional data
> >
> > from server sessionid 0x3475d1605321be9, likely server has closed
> >
> > socket
> >
> > 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> >
> > 0x3475d1605321be9 closed
> >
> > 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
> >
> > EventThread shut down
> >
> > hbase(main):006:0> count 'table4-0'
> >
> > 0 row(s) in 0.0200 seconds
> >
> > => 0
> >
> > hbase(main):007:0>
> >
> > Complete data loss happened,
> >
> > WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
> >
> >
> >
> >
> > [X]
> >
> > This e-mail and its attachments contain confidential information from
> >
> > HUAWEI, which is intended only for the person or entity whose address
> >
> > is listed above. Any use of the information contained herein in any
> >
> > way (including, but not limited to, total or partial disclosure,
> >
> > reproduction, or dissemination) by persons other than the intended
> >
> > recipient(s) is prohibited. If you receive this e-mail in error,
> >
> > please notify the sender by phone or email immediately and delete it!
> >
> > [X]
> >
> >
> >
> > --
> >
> > Best regards,
> >
> >
> >  - Andy
> >
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
> >
>

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Andrew Purtell <an...@gmail.com>.

My attempt to reproduce this issue:

1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a dev
box.

2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver also on
this dev box.

3. Set dfs.replication and hbase.regionserver.hlog.tolerable.lowreplication
to 1. Set up a keystore and enabled WAL encryption.

4. Created a test table.

5. Used YCSB to write 1000 rows to the test table. No flushes observed.

6. Used the shell to count the number of records in the test table. Count =
1000 rows

7. kill -9 the regionserver process.

8. Started a new regionserver process. Observed log splitting and replay in
the regionserver log, no errors.

9. Used the shell to count the number of records in the test table. Count =
1000 rows

Tried this a few times.

Shankar, can you try running through the above and let us know if the
outcome is different?



On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <an...@gmail.com>
wrote:

> Thanks for the detail. So to summarize:
>
> 0. HBase 0.98.3 and HDFS 2.4.1
>
> 1. All data before failure has not yet been flushed so only exists in the
> WAL files.
>
> 2. During distributed splitting, the WAL has either not been written out
> or is unreadable:
>
>
> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
> Premature EOF from inputStream
>
>
> 3. This file is still moved to oldWALs even though splitting failed.
>
> 4. Setting 'hbase.regionserver.wal.encryption' to false allows for data
> recovery in your scenario.
>
> See https://issues.apache.org/jira/browse/HBASE-11595
>
>
>
>
> On Jul 26, 2014, at 6:50 AM, Shankar hiremath <sh...@huawei.com>
> wrote:
>
>
> Hi Andrew,
>
>
> Please find the details
>
>
> Hbase 0.98.3 & hadoop 2.4.1
>
> Hbase root file system on hdfs
>
>
> On Hmaster side there is no failure or error message in the log file
>
> On Region Server side the the below error message reported as below
>
>
> Region Server Log:
>
> 2014-07-26 19:29:15,904 DEBUG [regionserver60020-SendThread(host2:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c, packet::
> clientPath:null serverPath:null finished:false header:: 172,4
>  replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F
>  response::
> #ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
>
> 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
>
> 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
>
> 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
>
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
>
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
>
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
>
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
>
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
>
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
>
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
>
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
>
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
>
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
>
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
>
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
>
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
>
>
> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
> Premature EOF from inputStream
>
>
> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> wal.HLogSplitter: Finishing writing output logs and closing down.
>
> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> wal.HLogSplitter: Waiting for split writer threads to finish
>
> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> wal.HLogSplitter: Split writers finished
>
> 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> wal.HLogSplitter: Processed 0 edits across 0 regions; log
> file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta
> is corrupted = false progress failed = false
>
> 2014-07-26 19:29:16,184 DEBUG [regionserver60020-SendThread(host2:2181)]
> zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
>
>
>
> When I query the table data, which was in WAL files(before the
> RegionServer machine went down) is not coming,
>
> One more thing what I observed is even when the WAL file not successfully
> processed then also it is moving to /oldWALs folder.
>
> So when I revert back the below 3 configuration in Region Server side and
> restart, since the WAL is already moved to oldWALS/ folder,
>
> So it will not get processed.
>
>
> <property>
>
>    <name>hbase.regionserver.hlog.reader.impl</name>
>
>
>   <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>
> </property>
>
> <property>
>
>   <name>hbase.regionserver.hlog.writer.impl</name>
>
>
>   <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>
> </property>
>
> <property>
>
>   <name>hbase.regionserver.wal.encryption</name>
>
>   <value>true</value>
>
> </property>
>
>
>
> -------------------------------------------------------------------------------------------------------------
>
>
> And one more scenario I tried (Anoop suggested), with the below
> configuration (instead of deleting the below 3 config paramters
>
> Kepp all but make only 'hbase.regionserver.wal.encryption=false') the
> encrypted wal file is getting processed
>
> Successfully, and the query table is giving the WAL data (before the
> RegionServer machine went down) correctly.
>
>
> <property>
>
>   <name>hbase.regionserver.hlog.reader.impl</name>
>
>
>   <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>
> </property>
>
> <property>
>
>   <name>hbase.regionserver.hlog.writer.impl</name>
>
>
>   <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>
> </property>
>
> <property>
>
>   <name>hbase.regionserver.wal.encryption</name>
>
>   <value>false</value>
>
> </property>
>
>
>
> Regards
>
> -Shankar
>
>
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
>
>
>
>
>
> -----Original Message-----
>
> From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
> <an...@gmail.com>] On Behalf Of Andrew Purtell
>
> Sent: 26 July 2014 AM 02:21
>
> To: user@hbase.apache.org
>
> Subject: Re: HBase file encryption, inconsistencies observed and data loss
>
>
> Encryption (or the lack of it) doesn't explain missing HFiles.
>
>
> Most likely if you are having a problem with encryption, this will
> manifest as follows: HFiles will be present. However, you will find many
> IOExceptions in the regionserver logs as they attempt to open the HFiles
> but fail because the data is unreadable.
>
>
> We should start by looking at more basic issues. What could explain the
> total disappearance of HFiles.
>
>
> Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on
> the local filesystem (fs URL starts with file://)?
>
>
> In your email you provide only exceptions printed by the client. What kind
> of exceptions appear in the regionserver logs? Or appear in the master log?
>
> If the logs are large your best bet is to pastebin them and then send the
> URL to the paste in your response.
>
>
>
>
>
> On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
> shankar.hiremath@huawei.com> wrote:
>
>
> HBase file encryption some inconsistencies observed and data loss
>
> happens after running the hbck tool,
>
> the operation steps are as below.    (one thing what I observed is, on
>
> startup of HMaster if it is not able to process the WAL file, then
>
> also it moved to /oldWALs)
>
>
> Procedure:
>
> 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
>
> encryption and WAL file encryption as below, and perform 'table4-0'
>
> put operations (100 records added) <property>
>
> <name>hbase.crypto.keyprovider</name>
>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>
> </property>
>
> <property>
>
> <name>hbase.crypto.keyprovider.parameters</name>
>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>
> </value>
>
> </property>
>
> <property>
>
> <name>hbase.crypto.master.key.name</name>
>
> <value>hdfs</value>
>
> </property>
>
> <property>
>
> <name>hfile.format.version</name>
>
> <value>3</value>
>
> </property>
>
> <property>
>
> <name>hbase.regionserver.hlog.reader.impl</name>
>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>
> r</value>
>
> </property>
>
> <property>
>
> <name>hbase.regionserver.hlog.writer.impl</name>
>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>
> r</value>
>
> </property>
>
> <property>
>
> <name>hbase.regionserver.wal.encryption</name>
>
> <value>true</value>
>
> </property>
>
> 3. Machine went down, so all process went down
>
>
> 4. We disabled the WAL file encryption for performance reason, and
>
> keep encryption only for Hfile, as below <property>
>
> <name>hbase.crypto.keyprovider</name>
>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>
> </property>
>
> <property>
>
> <name>hbase.crypto.keyprovider.parameters</name>
>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>
> </value>
>
> </property>
>
> <property>
>
> <name>hbase.crypto.master.key.name</name>
>
> <value>hdfs</value>
>
> </property>
>
> <property>
>
> <name>hfile.format.version</name>
>
> <value>3</value>
>
> </property>
>
> 5. Start the Region Server and query the 'table4-0' data
>
> hbase(main):003:0> count 'table4-0'
>
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>
> online on
>
> XX-XX-XX-XX,60020,1406209023146
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>
> ame(HRegionServer.java:2685)
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>
> rver.java:4119)
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>
> java:3066)
>
> at
>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>
> 2.callBlockingMethod(ClientProtos.java:29497)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>
> at
>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>
> cheduler.java:168)
>
> at
>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>
> eduler.java:39)
>
> at
>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>
> r.java:111) at java.lang.Thread.run(Thread.java:662)
>
> 6. Not able to read the data, so we decided to revert back the
>
> configuration (as original) 7. Kill/Stop the Region Server, revert all
>
> the configurations as original, as below <property>
>
> <name>hbase.crypto.keyprovider</name>
>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>
> </property>
>
> <property>
>
> <name>hbase.crypto.keyprovider.parameters</name>
>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>
> </value>
>
> </property>
>
> <property>
>
> <name>hbase.crypto.master.key.name</name>
>
> <value>hdfs</value>
>
> </property>
>
> <property>
>
> <name>hfile.format.version</name>
>
> <value>3</value>
>
> </property>
>
> <property>
>
> <name>hbase.regionserver.hlog.reader.impl</name>
>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>
> r</value>
>
> </property>
>
> <property>
>
> <name>hbase.regionserver.hlog.writer.impl</name>
>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>
> r</value>
>
> </property>
>
> <property>
>
> <name>hbase.regionserver.wal.encryption</name>
>
> <value>true</value>
>
> </property>
>
> 7. Start the Region Server, and perform the 'table4-0' query
>
> hbase(main):003:0> count 'table4-0'
>
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>
> online on
>
> XX-XX-XX-XX,60020,1406209023146
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>
> ame(HRegionServer.java:2685)
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>
> rver.java:4119)
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>
> java:3066)
>
> at
>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>
> 2.callBlockingMethod(ClientProtos.java:29497)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>
> at
>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>
> cheduler.java:168)
>
> at
>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>
> eduler.java:39)
>
> at
>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>
> r.java:111) at java.lang.Thread.run(Thread.java:662)
>
> 8. Run the hbase hbck to repair, as below ./hbase hbck -details
>
> .........................
>
> Summary:
>
> table1-0 is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> table2-0 is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> table3-0 is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> table4-0 is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> table5-0 is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> table6-0 is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> table7-0 is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> table8-0 is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> table9-0 is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> hbase:meta is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> hbase:namespace is okay.
>
> Number of regions: 0
>
> Deployed on:
>
> 22 inconsistencies detected.
>
> Status: INCONSISTENT
>
> 2014-07-24 19:13:05,532 INFO [main]
>
> client.HConnectionManager$HConnectionImplementation: Closing master
>
> protocol: MasterService
>
> 2014-07-24 19:13:05,533 INFO [main]
>
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>
> sessionid=0x1475d1611611bcf
>
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session:
>
> 0x1475d1611611bcf
>
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
>
> client for session: 0x1475d1611611bcf
>
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>
> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf, packet::
>
> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
>
> 6,4295102074,0 request:: null response:: null
>
> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
>
> Disconnecting client for session: 0x1475d1611611bcf
>
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>
> zookeeper.ClientCnxn: An exception was thrown while closing send
>
> thread for session 0x1475d1611611bcf : Unable to read additional data
>
> from server sessionid 0x1475d1611611bcf, likely server has closed
>
> socket
>
> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
>
> EventThread shut down
>
> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
>
> 0x1475d1611611bcf closed
>
> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
>
> 9. Fix the assignments as below
>
> ./hbase hbck -fixAssignments
>
> Summary:
>
> table1-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table2-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table3-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table4-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table5-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table6-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table7-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table8-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table9-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> 0 inconsistencies detected.
>
> Status: OK
>
> 2014-07-24 19:44:55,194 INFO [main]
>
> client.HConnectionManager$HConnectionImplementation: Closing master
>
> protocol: MasterService
>
> 2014-07-24 19:44:55,194 INFO [main]
>
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>
> sessionid=0x2475d15f7b31b73
>
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session:
>
> 0x2475d15f7b31b73
>
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
>
> client for session: 0x2475d15f7b31b73
>
> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>
> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73, packet::
>
> clientPath:null serverPath:null finished:false header:: 7,-11 replyHeader::
>
> 7,4295102377,0 request:: null response:: null
>
> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
>
> Disconnecting client for session: 0x2475d15f7b31b73
>
> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>
> zookeeper.ClientCnxn: An exception was thrown while closing send
>
> thread for session 0x2475d15f7b31b73 : Unable to read additional data
>
> from server sessionid 0x2475d15f7b31b73, likely server has closed
>
> socket
>
> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
>
> 0x2475d15f7b31b73 closed
>
> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
>
> EventThread shut down
>
> 10. Fix the assignments as below
>
> ./hbase hbck -fixAssignments -fixMeta
>
> Summary:
>
> table1-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table2-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table3-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table4-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table5-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table6-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table7-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table8-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> table9-0 is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>
> Number of regions: 1
>
> Deployed on: XX-XX-XX-XX,60020,1406209023146
>
> 0 inconsistencies detected.
>
> Status: OK
>
> 2014-07-24 19:46:16,290 INFO [main]
>
> client.HConnectionManager$HConnectionImplementation: Closing master
>
> protocol: MasterService
>
> 2014-07-24 19:46:16,290 INFO [main]
>
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>
> sessionid=0x3475d1605321be9
>
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session:
>
> 0x3475d1605321be9
>
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
>
> client for session: 0x3475d1605321be9
>
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>
> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9, packet::
>
> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
>
> 6,4295102397,0 request:: null response:: null
>
> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
>
> Disconnecting client for session: 0x3475d1605321be9
>
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>
> zookeeper.ClientCnxn: An exception was thrown while closing send
>
> thread for session 0x3475d1605321be9 : Unable to read additional data
>
> from server sessionid 0x3475d1605321be9, likely server has closed
>
> socket
>
> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
>
> 0x3475d1605321be9 closed
>
> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
>
> EventThread shut down
>
> hbase(main):006:0> count 'table4-0'
>
> 0 row(s) in 0.0200 seconds
>
> => 0
>
> hbase(main):007:0>
>
> Complete data loss happened,
>
> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>
>
>
>
> [X]
>
> This e-mail and its attachments contain confidential information from
>
> HUAWEI, which is intended only for the person or entity whose address
>
> is listed above. Any use of the information contained herein in any
>
> way (including, but not limited to, total or partial disclosure,
>
> reproduction, or dissemination) by persons other than the intended
>
> recipient(s) is prohibited. If you receive this e-mail in error,
>
> please notify the sender by phone or email immediately and delete it!
>
> [X]
>
>
>
> --
>
> Best regards,
>
>
>  - Andy
>
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Andrew Purtell <an...@gmail.com>.

Thanks for the detail. So to summarize:

0. HBase 0.98.3 and HDFS 2.4.1

1. All data before failure has not yet been flushed so only exists in the WAL files. 

2. During distributed splitting, the WAL has either not been written out or is unreadable:

> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0] codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException: Premature EOF from inputStream

3. This file is still moved to oldWALs even though splitting failed. 

4. Setting 'hbase.regionserver.wal.encryption' to false allows for data recovery in your scenario. 

See https://issues.apache.org/jira/browse/HBASE-11595



> On Jul 26, 2014, at 6:50 AM, Shankar hiremath <sh...@huawei.com> wrote:
> 
> Hi Andrew,
> 
> Please find the details
> 
> Hbase 0.98.3 & hadoop 2.4.1
> Hbase root file system on hdfs
> 
> On Hmaster side there is no failure or error message in the log file
> On Region Server side the the below error message reported as below
> 
> Region Server Log:
> 2014-07-26 19:29:15,904 DEBUG [regionserver60020-SendThread(host2:2181)] zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c, packet:: clientPath:null serverPath:null finished:false header:: 172,4  replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F  response:: #ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
> 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
> 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
> 2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
> 2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
> 2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
> 
> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0] codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException: Premature EOF from inputStream
> 
> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0] wal.HLogSplitter: Finishing writing output logs and closing down.
> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0] wal.HLogSplitter: Waiting for split writer threads to finish
> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0] wal.HLogSplitter: Split writers finished
> 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0] wal.HLogSplitter: Processed 0 edits across 0 regions; log file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta is corrupted = false progress failed = false
> 2014-07-26 19:29:16,184 DEBUG [regionserver60020-SendThread(host2:2181)] zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
> 
> 
> When I query the table data, which was in WAL files(before the RegionServer machine went down) is not coming,
> One more thing what I observed is even when the WAL file not successfully processed then also it is moving to /oldWALs folder.
> So when I revert back the below 3 configuration in Region Server side and restart, since the WAL is already moved to oldWALS/ folder,
> So it will not get processed.
> 
> <property>
>   <name>hbase.regionserver.hlog.reader.impl</name>
>   <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
>   <name>hbase.regionserver.hlog.writer.impl</name>
>   <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
>   <name>hbase.regionserver.wal.encryption</name>
>   <value>true</value>
> </property>
> 
> -------------------------------------------------------------------------------------------------------------
> 
> And one more scenario I tried (Anoop suggested), with the below configuration (instead of deleting the below 3 config paramters
> Kepp all but make only 'hbase.regionserver.wal.encryption=false') the encrypted wal file is getting processed
> Successfully, and the query table is giving the WAL data (before the RegionServer machine went down) correctly.
> 
> <property>
>   <name>hbase.regionserver.hlog.reader.impl</name>
>   <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
>   <name>hbase.regionserver.hlog.writer.impl</name>
>   <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
>   <name>hbase.regionserver.wal.encryption</name>
>   <value>false</value>       
> </property>
> 
> 
> Regards
> -Shankar
> 
> This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
> 
> 
> 
> 
> 
> -----Original Message-----
> From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com] On Behalf Of Andrew Purtell
> Sent: 26 July 2014 AM 02:21
> To: user@hbase.apache.org
> Subject: Re: HBase file encryption, inconsistencies observed and data loss
> 
> Encryption (or the lack of it) doesn't explain missing HFiles.
> 
> Most likely if you are having a problem with encryption, this will manifest as follows: HFiles will be present. However, you will find many IOExceptions in the regionserver logs as they attempt to open the HFiles but fail because the data is unreadable.
> 
> We should start by looking at more basic issues. What could explain the total disappearance of HFiles.
> 
> Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on the local filesystem (fs URL starts with file://)?
> 
> In your email you provide only exceptions printed by the client. What kind of exceptions appear in the regionserver logs? Or appear in the master log?
> If the logs are large your best bet is to pastebin them and then send the URL to the paste in your response.
> 
> 
> 
> 
>> On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath < shankar.hiremath@huawei.com> wrote:
>> 
>> HBase file encryption some inconsistencies observed and data loss 
>> happens after running the hbck tool,
>> the operation steps are as below.    (one thing what I observed is, on
>> startup of HMaster if it is not able to process the WAL file, then 
>> also it moved to /oldWALs)
>> 
>> Procedure:
>> 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile 
>> encryption and WAL file encryption as below, and perform 'table4-0' 
>> put operations (100 records added) <property> 
>> <name>hbase.crypto.keyprovider</name>
>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> </property>
>> <property>
>> <name>hbase.crypto.keyprovider.parameters</name>
>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>> </value>
>> </property>
>> <property>
>> <name>hbase.crypto.master.key.name</name>
>> <value>hdfs</value>
>> </property>
>> <property>
>> <name>hfile.format.version</name>
>> <value>3</value>
>> </property>
>> <property>
>> <name>hbase.regionserver.hlog.reader.impl</name>
>> 
>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>> r</value>
>> </property>
>> <property>
>> <name>hbase.regionserver.hlog.writer.impl</name>
>> 
>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>> r</value>
>> </property>
>> <property>
>> <name>hbase.regionserver.wal.encryption</name>
>> <value>true</value>
>> </property>
>> 3. Machine went down, so all process went down
>> 
>> 4. We disabled the WAL file encryption for performance reason, and 
>> keep encryption only for Hfile, as below <property> 
>> <name>hbase.crypto.keyprovider</name>
>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> </property>
>> <property>
>> <name>hbase.crypto.keyprovider.parameters</name>
>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>> </value>
>> </property>
>> <property>
>> <name>hbase.crypto.master.key.name</name>
>> <value>hdfs</value>
>> </property>
>> <property>
>> <name>hfile.format.version</name>
>> <value>3</value>
>> </property>
>> 5. Start the Region Server and query the 'table4-0' data 
>> hbase(main):003:0> count 'table4-0'
>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region 
>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not 
>> online on
>> XX-XX-XX-XX,60020,1406209023146
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>> ame(HRegionServer.java:2685)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>> rver.java:4119)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>> java:3066)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>> 2.callBlockingMethod(ClientProtos.java:29497)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>> cheduler.java:168)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>> eduler.java:39)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>> 6. Not able to read the data, so we decided to revert back the 
>> configuration (as original) 7. Kill/Stop the Region Server, revert all 
>> the configurations as original, as below <property> 
>> <name>hbase.crypto.keyprovider</name>
>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> </property>
>> <property>
>> <name>hbase.crypto.keyprovider.parameters</name>
>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>> </value>
>> </property>
>> <property>
>> <name>hbase.crypto.master.key.name</name>
>> <value>hdfs</value>
>> </property>
>> <property>
>> <name>hfile.format.version</name>
>> <value>3</value>
>> </property>
>> <property>
>> <name>hbase.regionserver.hlog.reader.impl</name>
>> 
>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>> r</value>
>> </property>
>> <property>
>> <name>hbase.regionserver.hlog.writer.impl</name>
>> 
>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>> r</value>
>> </property>
>> <property>
>> <name>hbase.regionserver.wal.encryption</name>
>> <value>true</value>
>> </property>
>> 7. Start the Region Server, and perform the 'table4-0' query 
>> hbase(main):003:0> count 'table4-0'
>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region 
>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not 
>> online on
>> XX-XX-XX-XX,60020,1406209023146
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>> ame(HRegionServer.java:2685)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>> rver.java:4119)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>> java:3066)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>> 2.callBlockingMethod(ClientProtos.java:29497)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>> cheduler.java:168)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>> eduler.java:39)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>> 8. Run the hbase hbck to repair, as below ./hbase hbck -details 
>> .........................
>> Summary:
>> table1-0 is okay.
>> Number of regions: 0
>> Deployed on:
>> table2-0 is okay.
>> Number of regions: 0
>> Deployed on:
>> table3-0 is okay.
>> Number of regions: 0
>> Deployed on:
>> table4-0 is okay.
>> Number of regions: 0
>> Deployed on:
>> table5-0 is okay.
>> Number of regions: 0
>> Deployed on:
>> table6-0 is okay.
>> Number of regions: 0
>> Deployed on:
>> table7-0 is okay.
>> Number of regions: 0
>> Deployed on:
>> table8-0 is okay.
>> Number of regions: 0
>> Deployed on:
>> table9-0 is okay.
>> Number of regions: 0
>> Deployed on:
>> hbase:meta is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> Number of regions: 0
>> Deployed on:
>> hbase:namespace is okay.
>> Number of regions: 0
>> Deployed on:
>> 22 inconsistencies detected.
>> Status: INCONSISTENT
>> 2014-07-24 19:13:05,532 INFO [main]
>> client.HConnectionManager$HConnectionImplementation: Closing master
>> protocol: MasterService
>> 2014-07-24 19:13:05,533 INFO [main]
>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper 
>> sessionid=0x1475d1611611bcf
>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session:
>> 0x1475d1611611bcf
>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing 
>> client for session: 0x1475d1611611bcf
>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf, packet::
>> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
>> 6,4295102074,0 request:: null response:: null
>> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: 
>> Disconnecting client for session: 0x1475d1611611bcf
>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> zookeeper.ClientCnxn: An exception was thrown while closing send 
>> thread for session 0x1475d1611611bcf : Unable to read additional data 
>> from server sessionid 0x1475d1611611bcf, likely server has closed 
>> socket
>> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
>> EventThread shut down
>> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
>> 0x1475d1611611bcf closed
>> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
>> 9. Fix the assignments as below
>> ./hbase hbck -fixAssignments
>> Summary:
>> table1-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table2-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table3-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table4-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table5-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table6-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table7-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table8-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table9-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> 0 inconsistencies detected.
>> Status: OK
>> 2014-07-24 19:44:55,194 INFO [main]
>> client.HConnectionManager$HConnectionImplementation: Closing master
>> protocol: MasterService
>> 2014-07-24 19:44:55,194 INFO [main]
>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>> sessionid=0x2475d15f7b31b73
>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session:
>> 0x2475d15f7b31b73
>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing 
>> client for session: 0x2475d15f7b31b73
>> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73, packet::
>> clientPath:null serverPath:null finished:false header:: 7,-11 replyHeader::
>> 7,4295102377,0 request:: null response:: null
>> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: 
>> Disconnecting client for session: 0x2475d15f7b31b73
>> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> zookeeper.ClientCnxn: An exception was thrown while closing send 
>> thread for session 0x2475d15f7b31b73 : Unable to read additional data 
>> from server sessionid 0x2475d15f7b31b73, likely server has closed 
>> socket
>> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
>> 0x2475d15f7b31b73 closed
>> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
>> EventThread shut down
>> 10. Fix the assignments as below
>> ./hbase hbck -fixAssignments -fixMeta
>> Summary:
>> table1-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table2-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table3-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table4-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table5-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table6-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table7-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table8-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> table9-0 is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>> Number of regions: 1
>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>> 0 inconsistencies detected.
>> Status: OK
>> 2014-07-24 19:46:16,290 INFO [main]
>> client.HConnectionManager$HConnectionImplementation: Closing master
>> protocol: MasterService
>> 2014-07-24 19:46:16,290 INFO [main]
>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>> sessionid=0x3475d1605321be9
>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session:
>> 0x3475d1605321be9
>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing 
>> client for session: 0x3475d1605321be9
>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9, packet::
>> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
>> 6,4295102397,0 request:: null response:: null
>> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: 
>> Disconnecting client for session: 0x3475d1605321be9
>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> zookeeper.ClientCnxn: An exception was thrown while closing send 
>> thread for session 0x3475d1605321be9 : Unable to read additional data 
>> from server sessionid 0x3475d1605321be9, likely server has closed 
>> socket
>> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
>> 0x3475d1605321be9 closed
>> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
>> EventThread shut down
>> hbase(main):006:0> count 'table4-0'
>> 0 row(s) in 0.0200 seconds
>> => 0
>> hbase(main):007:0>
>> Complete data loss happened,
>> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>> 
>> 
>> 
>> [X]
>> This e-mail and its attachments contain confidential information from 
>> HUAWEI, which is intended only for the person or entity whose address 
>> is listed above. Any use of the information contained herein in any 
>> way (including, but not limited to, total or partial disclosure, 
>> reproduction, or dissemination) by persons other than the intended 
>> recipient(s) is prohibited. If you receive this e-mail in error, 
>> please notify the sender by phone or email immediately and delete it!
>> [X]
> 
> 
> --
> Best regards,
> 
>  - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

RE: HBase file encryption, inconsistencies observed and data loss

Posted by Shankar hiremath <sh...@huawei.com>.

Hi Andrew,

Please find the details

Hbase 0.98.3 & hadoop 2.4.1
Hbase root file system on hdfs

On Hmaster side there is no failure or error message in the log file
On Region Server side the the below error message reported as below

Region Server Log:
2014-07-26 19:29:15,904 DEBUG [regionserver60020-SendThread(host2:2181)] zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c, packet:: clientPath:null serverPath:null finished:false header:: 172,4  replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F  response:: #ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
2014-07-26 19:29:15,905 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
2014-07-26 19:29:15,906 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
2014-07-26 19:29:15,907 DEBUG [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15] wal.HLogSplitter: Writer thread Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting

2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0] codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException: Premature EOF from inputStream

2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0] wal.HLogSplitter: Finishing writing output logs and closing down.
2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0] wal.HLogSplitter: Waiting for split writer threads to finish
2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0] wal.HLogSplitter: Split writers finished
2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0] wal.HLogSplitter: Processed 0 edits across 0 regions; log file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta is corrupted = false progress failed = false
2014-07-26 19:29:16,184 DEBUG [regionserver60020-SendThread(host2:2181)] zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c


When I query the table data, which was in WAL files(before the RegionServer machine went down) is not coming,
One more thing what I observed is even when the WAL file not successfully processed then also it is moving to /oldWALs folder.
So when I revert back the below 3 configuration in Region Server side and restart, since the WAL is already moved to oldWALS/ folder,
So it will not get processed.

<property>
	<name>hbase.regionserver.hlog.reader.impl</name>
	<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
	<name>hbase.regionserver.hlog.writer.impl</name>
	<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
	<name>hbase.regionserver.wal.encryption</name>
	<value>true</value>
</property>

-------------------------------------------------------------------------------------------------------------

And one more scenario I tried (Anoop suggested), with the below configuration (instead of deleting the below 3 config paramters
Kepp all but make only 'hbase.regionserver.wal.encryption=false') the encrypted wal file is getting processed
Successfully, and the query table is giving the WAL data (before the RegionServer machine went down) correctly.

<property>
	<name>hbase.regionserver.hlog.reader.impl</name>
	<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
	<name>hbase.regionserver.hlog.writer.impl</name>
	<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
	<name>hbase.regionserver.wal.encryption</name>
	<value>false</value>       
</property>


Regards
-Shankar

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!





-----Original Message-----
From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com] On Behalf Of Andrew Purtell
Sent: 26 July 2014 AM 02:21
To: user@hbase.apache.org
Subject: Re: HBase file encryption, inconsistencies observed and data loss

Encryption (or the lack of it) doesn't explain missing HFiles.

Most likely if you are having a problem with encryption, this will manifest as follows: HFiles will be present. However, you will find many IOExceptions in the regionserver logs as they attempt to open the HFiles but fail because the data is unreadable.

We should start by looking at more basic issues. What could explain the total disappearance of HFiles.

Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on the local filesystem (fs URL starts with file://)?

In your email you provide only exceptions printed by the client. What kind of exceptions appear in the regionserver logs? Or appear in the master log?
If the logs are large your best bet is to pastebin them and then send the URL to the paste in your response.




On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath < shankar.hiremath@huawei.com> wrote:

> HBase file encryption some inconsistencies observed and data loss 
> happens after running the hbck tool,
> the operation steps are as below.    (one thing what I observed is, on
> startup of HMaster if it is not able to process the WAL file, then 
> also it moved to /oldWALs)
>
> Procedure:
> 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile 
> encryption and WAL file encryption as below, and perform 'table4-0' 
> put operations (100 records added) <property> 
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
> r</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
> r</value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 3. Machine went down, so all process went down
>
> 4. We disabled the WAL file encryption for performance reason, and 
> keep encryption only for Hfile, as below <property> 
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> 5. Start the Region Server and query the 'table4-0' data 
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region 
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not 
> online on
> XX-XX-XX-XX,60020,1406209023146
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
> ame(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
> rver.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> java:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> 2.callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
> cheduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
> eduler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
> r.java:111) at java.lang.Thread.run(Thread.java:662)
> 6. Not able to read the data, so we decided to revert back the 
> configuration (as original) 7. Kill/Stop the Region Server, revert all 
> the configurations as original, as below <property> 
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
> r</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
> r</value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 7. Start the Region Server, and perform the 'table4-0' query 
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region 
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not 
> online on
> XX-XX-XX-XX,60020,1406209023146
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
> ame(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
> rver.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> java:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> 2.callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
> cheduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
> eduler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
> r.java:111) at java.lang.Thread.run(Thread.java:662)
> 8. Run the hbase hbck to repair, as below ./hbase hbck -details 
> .........................
> Summary:
> table1-0 is okay.
> Number of regions: 0
> Deployed on:
> table2-0 is okay.
> Number of regions: 0
> Deployed on:
> table3-0 is okay.
> Number of regions: 0
> Deployed on:
> table4-0 is okay.
> Number of regions: 0
> Deployed on:
> table5-0 is okay.
> Number of regions: 0
> Deployed on:
> table6-0 is okay.
> Number of regions: 0
> Deployed on:
> table7-0 is okay.
> Number of regions: 0
> Deployed on:
> table8-0 is okay.
> Number of regions: 0
> Deployed on:
> table9-0 is okay.
> Number of regions: 0
> Deployed on:
> hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> Number of regions: 0
> Deployed on:
> hbase:namespace is okay.
> Number of regions: 0
> Deployed on:
> 22 inconsistencies detected.
> Status: INCONSISTENT
> 2014-07-24 19:13:05,532 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:13:05,533 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper 
> sessionid=0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session:
> 0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing 
> client for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf, packet::
> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
> 6,4295102074,0 request:: null response:: null
> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: 
> Disconnecting client for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send 
> thread for session 0x1475d1611611bcf : Unable to read additional data 
> from server sessionid 0x1475d1611611bcf, likely server has closed 
> socket
> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> 0x1475d1611611bcf closed
> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> 9. Fix the assignments as below
> ./hbase hbck -fixAssignments
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session:
> 0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing 
> client for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73, packet::
> clientPath:null serverPath:null finished:false header:: 7,-11 replyHeader::
> 7,4295102377,0 request:: null response:: null
> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: 
> Disconnecting client for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send 
> thread for session 0x2475d15f7b31b73 : Unable to read additional data 
> from server sessionid 0x2475d15f7b31b73, likely server has closed 
> socket
> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> 0x2475d15f7b31b73 closed
> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> 10. Fix the assignments as below
> ./hbase hbck -fixAssignments -fixMeta
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session:
> 0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing 
> client for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9, packet::
> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
> 6,4295102397,0 request:: null response:: null
> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: 
> Disconnecting client for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send 
> thread for session 0x3475d1605321be9 : Unable to read additional data 
> from server sessionid 0x3475d1605321be9, likely server has closed 
> socket
> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> 0x3475d1605321be9 closed
> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> hbase(main):006:0> count 'table4-0'
> 0 row(s) in 0.0200 seconds
> => 0
> hbase(main):007:0>
> Complete data loss happened,
> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>
>
>
> [X]
> This e-mail and its attachments contain confidential information from 
> HUAWEI, which is intended only for the person or entity whose address 
> is listed above. Any use of the information contained herein in any 
> way (including, but not limited to, total or partial disclosure, 
> reproduction, or dissemination) by persons other than the intended 
> recipient(s) is prohibited. If you receive this e-mail in error, 
> please notify the sender by phone or email immediately and delete it!
> [X]
>
>
>
>
>
>


--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

Re: HBase file encryption, inconsistencies observed and data loss

Posted by Andrew Purtell <ap...@apache.org>.

Encryption (or the lack of it) doesn't explain missing HFiles.

Most likely if you are having a problem with encryption, this will manifest
as follows: HFiles will be present. However, you will find many
IOExceptions in the regionserver logs as they attempt to open the HFiles
but fail because the data is unreadable.

We should start by looking at more basic issues. What could explain the
total disappearance of HFiles.

Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on the
local filesystem (fs URL starts with file://)?

In your email you provide only exceptions printed by the client. What kind
of exceptions appear in the regionserver logs? Or appear in the master log?
If the logs are large your best bet is to pastebin them and then send the
URL to the paste in your response.




On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
shankar.hiremath@huawei.com> wrote:

> HBase file encryption some inconsistencies observed and data loss happens
> after running the hbck tool,
> the operation steps are as below.    (one thing what I observed is, on
> startup of HMaster if it is not able to process the WAL file, then also it
> moved to /oldWALs)
>
> Procedure:
> 1. Start the Hbase services (HMaster & region Server)
> 2. Enable HFile encryption and WAL file encryption as below, and perform
> 'table4-0' put operations (100 records added)
> <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 3. Machine went down, so all process went down
>
> 4. We disabled the WAL file encryption for performance reason, and keep
> encryption only for Hfile, as below
> <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> 5. Start the Region Server and query the 'table4-0' data
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online on
> XX-XX-XX-XX,60020,1406209023146
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
> at java.lang.Thread.run(Thread.java:662)
> 6. Not able to read the data, so we decided to revert back the
> configuration (as original)
> 7. Kill/Stop the Region Server, revert all the configurations as original,
> as below
> <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 7. Start the Region Server, and perform the 'table4-0' query
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online on
> XX-XX-XX-XX,60020,1406209023146
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
> at java.lang.Thread.run(Thread.java:662)
> 8. Run the hbase hbck to repair, as below
> ./hbase hbck -details
> .........................
> Summary:
> table1-0 is okay.
> Number of regions: 0
> Deployed on:
> table2-0 is okay.
> Number of regions: 0
> Deployed on:
> table3-0 is okay.
> Number of regions: 0
> Deployed on:
> table4-0 is okay.
> Number of regions: 0
> Deployed on:
> table5-0 is okay.
> Number of regions: 0
> Deployed on:
> table6-0 is okay.
> Number of regions: 0
> Deployed on:
> table7-0 is okay.
> Number of regions: 0
> Deployed on:
> table8-0 is okay.
> Number of regions: 0
> Deployed on:
> table9-0 is okay.
> Number of regions: 0
> Deployed on:
> hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:acl is okay.
> Number of regions: 0
> Deployed on:
> hbase:namespace is okay.
> Number of regions: 0
> Deployed on:
> 22 inconsistencies detected.
> Status: INCONSISTENT
> 2014-07-24 19:13:05,532 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:13:05,533 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session:
> 0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing client
> for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf, packet::
> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
> 6,4295102074,0 request:: null response:: null
> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: Disconnecting
> client for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread for
> session 0x1475d1611611bcf : Unable to read additional data from server
> sessionid 0x1475d1611611bcf, likely server has closed socket
> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> 0x1475d1611611bcf closed
> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> 9. Fix the assignments as below
> ./hbase hbck -fixAssignments
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session:
> 0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing client
> for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73, packet::
> clientPath:null serverPath:null finished:false header:: 7,-11 replyHeader::
> 7,4295102377,0 request:: null response:: null
> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: Disconnecting
> client for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread for
> session 0x2475d15f7b31b73 : Unable to read additional data from server
> sessionid 0x2475d15f7b31b73, likely server has closed socket
> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> 0x2475d15f7b31b73 closed
> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> 10. Fix the assignments as below
> ./hbase hbck -fixAssignments -fixMeta
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session:
> 0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing client
> for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9, packet::
> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
> 6,4295102397,0 request:: null response:: null
> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: Disconnecting
> client for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread for
> session 0x3475d1605321be9 : Unable to read additional data from server
> sessionid 0x3475d1605321be9, likely server has closed socket
> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> 0x3475d1605321be9 closed
> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> hbase(main):006:0> count 'table4-0'
> 0 row(s) in 0.0200 seconds
> => 0
> hbase(main):007:0>
> Complete data loss happened,
> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>
>
>
> [X]
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
> [X]
>
>
>
>
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)