You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2018/12/14 22:49:59 UTC

[GitHub] keith-turner edited a comment on issue #800: Bulk import process can result in a lost file

keith-turner edited a comment on issue #800: Bulk import process can result in a lost file
URL: https://github.com/apache/accumulo/issues/800#issuecomment-447499007
 
 
   Looking at the code I found a potential problem.  Each bulk import has a unique fate transaction id stored in zookeeper.  There is code that prevents RPCs related to the bulk import from running if the fate transaction id is deleted from ZK. Also there is code to wait for all active RPCs to finish.  So the bulk import FATE op will delete the id from ZK and then wait for all tservers to complete any RPCs that were active before it was deleted.
   
   The bulk import FATE op makes RPCs to intermediate tserver that inspect files.  Once the intermediate tserver determines where a file goes then makes an RPC to tserver that should load the file.  The problem is that only the intermediate RPC is checking the if the transaction id is active.  Really the final RPC should be doing this check.
   
   Below are some places in the code where this is all happening.
   
    * [CompleteBulkImport line 42](https://github.com/apache/accumulo/blob/rel/1.9.2/server/master/src/main/java/org/apache/accumulo/master/tableOps/CompleteBulkImport.java#L42) deletes the transaction id from ZK
    * [CopyFailed line 75](https://github.com/apache/accumulo/blob/rel/1.9.2/server/master/src/main/java/org/apache/accumulo/master/tableOps/CopyFailed.java#L75) waits for active RPCs to complete
   * [ClientServiceHandler line 345](https://github.com/apache/accumulo/blob/rel/1.9.2/server/base/src/main/java/org/apache/accumulo/server/client/ClientServiceHandler.java#L345) Runs the intermediate RPC to inspect files.  This RPC is run using TransactionWatcher.  TransactionWatcher will not run the RPC if the id does not exist in ZK.  If it does exist then increments a counter for the number of RPCs running for that transaction.
    * [TabletServer line 463](https://github.com/apache/accumulo/blob/rel/1.9.2/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L463)  This the RPC that actually loads a file into a tablet.  This RPC is not run using TransactionWatcher and I think it should.
   
   I think the intermediate RPC should stop using TransactionWatcher and the final RPC should start using TransactionWatcher.  I don't think there is a benefit to the intermediate one using it and it puts more load on ZK.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services