You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (Commented) (JIRA)" <ji...@apache.org> on 2012/02/21 21:20:48 UTC

[jira] [Commented] (ACCUMULO-422) Bulk import failing when tablet server dies

    [ https://issues.apache.org/jira/browse/ACCUMULO-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212874#comment-13212874 ] 

Keith Turner commented on ACCUMULO-422:
---------------------------------------

Found another case where bulk import was failing because a tablet server died.  

{noformat}
21 19:21:18,386 [fate.Fate] WARN : Failed to execute Repo, tid=7427f4b91dc2fbb0
java.lang.NullPointerException
        at org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:286)
        at org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:262)
        at org.apache.accumulo.server.master.tableOps.TraceRepo.isReady(TraceRepo.java:50)
        at org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:62)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)

21 19:21:18,398 [thrift.MasterClientService$Processor] ERROR: Internal error processing waitForTableOperation
java.lang.NullPointerException
        at org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:286)
        at org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:262)
        at org.apache.accumulo.server.master.tableOps.TraceRepo.isReady(TraceRepo.java:50)
        at org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:62)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

This error propogated to the randomwalk test client causing it to die.

{noformat}
21 19:21:18,411 [randomwalk.Framework] ERROR: Error during random walk
java.lang.Exception: Error running node Concurrent.xml
	at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259)
	at org.apache.accumulo.server.test.randomwalk.Framework.run(Framework.java:61)
	at org.apache.accumulo.server.test.randomwalk.Framework.main(Framework.java:114)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.accumulo.start.Main$1.run(Main.java:89)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.Exception: Error running node ct.BulkImport
	at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259)
	at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251)
	... 8 more
Caused by: org.apache.accumulo.core.client.AccumuloException: Internal error processing waitForTableOperation
	at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:293)
	at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:261)
	at org.apache.accumulo.core.client.admin.TableOperationsImpl.importDirectory(TableOperationsImpl.java:938)
	at org.apache.accumulo.server.test.randomwalk.concurrent.BulkImport.visit(BulkImport.java:132)
	at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251)
	... 9 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing waitForTableOperation
	at org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
	at org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForTableOperation(MasterClientService.java:684)
	at org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForTableOperation(MasterClientService.java:665)
	at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:83)
	at $Proxy1.waitForTableOperation(Unknown Source)
	at org.apache.accumulo.core.client.admin.TableOperationsImpl.waitForTableOperation(TableOperationsImpl.java:233)
	at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:275)
	... 13 more

{noformat}
                
> Bulk import failing when tablet server dies
> -------------------------------------------
>
>                 Key: ACCUMULO-422
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-422
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: 10 node cluster running 1.4.0-SNAPSHOT
>            Reporter: Keith Turner
>              Labels: 14_qa_bug
>             Fix For: 1.4.0
>
>
> Saw this issue while running random walk test w/ agitation.  The bulk import code picks random tablet servers and ask them to bulk load files.  If a tablet server dies it takes 30 seconds for the master to see the zookeeper lock was lost.  During this 30 second period the bulk import code will still try to use the tserver and fail. After it fails three times it will mark the file as a failure.  This all happens within a second.
> The bulk import code should probably catch TTransportException and black list the tablet server for that bulk import transaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira