You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by JH Lin <ma...@163.com> on 2017/12/11 15:57:35 UTC

is it unnecessary for 'flushlock' in Store?

hi all, I read some code recently about the flow of flush-table which entried in HBaseAdmin#flush(tableOrRegion).
i found that at least four locks in this flow(0.94):
-region lock
-region updatelock
-hlog cacheFlushLock
-store flushlock
(awesome locks usages)
------------------------
then i have certain questions in it:
A.why not flush stores concurrently?
-this has been fixed with hbase-6466 related by hbase-6980,well done!
B.why not flush regions concurrently? 
-as in some scenarios only one family in a table, the fixed question A will not gain any performance improved.
 so i dive in to master trunk and found that the flow has changed a lot :it has been delivered to master to do it(but i am not care the details in it in fact)
publicvoidflush(final TableName tableName)throws IOException {
1163checkTableExists(tableName);
1164if(isTableDisabled(tableName)) {
1165 LOG.info("Table is disabled: "+ tableName.getNameAsString());
1166return;
1167}
1168execProcedure("flush-table-proc", tableName.getNameAsString(),new HashMap<>());
1169}
publicbyte[]execProcedureWithReturn(String signature, String instance, Map<String, String> props)
2744throws IOException {
2745 ProcedureDescription desc = ProtobufUtil.buildProcedureDescription(signature, instance, props);
2746final ExecProcedureRequest request =
2747 ExecProcedureRequest.newBuilder().setProcedure(desc).build();
2748// run the procedure on the master
2749 ExecProcedureResponse response =executeCallable(
2750new MasterCallable<ExecProcedureResponse>(getConnection(),getRpcControllerFactory()) {
2751@Override
2752protected ExecProcedureResponse rpcCall()throws Exception {
2753return master.execProcedureWithRet(getRpcController(), request);
2754}
2755});
2756
2757return response.hasReturnData() ? response.getReturnData().toByteArray() : null;
2758}


C.is it the 'flushlock' unnecessary in Store? ( **this thread FOCUS**)
  since this lock is used in only one place(no other lock racers) :Store#internalFlushCache(),i.e..after grabbing scanner on snapshot and before finishing writing to hfile.
  and i saw this 'flushlock' retains in master trunk:
public List<Path>flushSnapshot(MemStoreSnapshot snapshot,long cacheFlushId,
48 MonitoredTask status, ThroughputController throughputController,
49 FlushLifeCycleTracker tracker)throws IOException {
50 ArrayList<Path> result =new ArrayList<>();
51int cellsCount = snapshot.getCellsCount();
52if(cellsCount ==0)return result;// don't flush if there are no entries
53
54// Use a store scanner to find which rows to flush.
55long smallestReadPoint = store.getSmallestReadPoint();
56 InternalScanner scanner =createScanner(snapshot.getScanners(), smallestReadPoint, tracker);
57 StoreFileWriter writer;
58try{
59// TODO: We can fail in the below block before we complete adding this flush to
60// list of store files. Add cleanup of anything put on filesystem if we fail.
61synchronized(flushLock) {
62 status.setStatus("Flushing "+ store +": creating writer");
63// Write the map out to the disk
64 writer = store.createWriterInTmp(cellsCount, store.getColumnFamilyDescriptor().getCompressionType(),
65/* isCompaction = */ false,
66/* includeMVCCReadpoint = */ true,
67/* includesTags = */ snapshot.isTagsPresent(),
68/* shouldDropBehind = */ false);
69 IOException e = null;
70try{
71performFlush(scanner, writer, smallestReadPoint, throughputController);
72}catch(IOException ioe) {
73 e = ioe;
74// throw the exception out
75throw ioe;
76}finally{
77if(e != null) {
78 writer.close();
79}else{
80finalizeWriter(writer, cacheFlushId, status);
81}
82}
83}
84}finally{
85 scanner.close();
86}
87 LOG.info("Flushed, sequenceid="+ cacheFlushId +", memsize="
88+ StringUtils.TraditionalBinaryPrefix.long2String(snapshot.getDataSize(),"",1) +
89", hasBloomFilter="+ writer.hasGeneralBloom() +
90", into tmp file "+ writer.getPath());
91 result.add(writer.getPath());
92return result;
93}
94}


  in fact, the 'cacheFlushLock' will do this duty of it.so i think it's unnecessary or only for tests as i found a TestStore will call it :
internalFlushCache(SortedSet<KeyValue>, long, TimeRangeTracker, AtomicLong, MonitoredTask) : Path - org.apache.hadoop.hbase.regionserver.Store
flushCache(long, SortedSet<KeyValue>, TimeRangeTracker, AtomicLong, MonitoredTask) : Path - org.apache.hadoop.hbase.regionserver.Store
flushCache(MonitoredTask) : void - org.apache.hadoop.hbase.regionserver.Store.StoreFlusherImpl
flushStore(Store, long) : void - org.apache.hadoop.hbase.regionserver.TestStore
internalFlushcache(HLog, long, MonitoredTask) : boolean - org.apache.hadoop.hbase.regionserver.HRegion
  that means in normal flow only HRegion#intervalFlushcache() will call it but TestStore is a inserted flow(ie.unpossible flow)
  >>I just wonder is it redundant in fact though i think even if the removal of this lock will not significantly improve  performance .<<
any input is appreciated ,thanks
--JH Lin