You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by "Losco, Jason [USA]" <Lo...@bah.com> on 2013/09/05 14:00:26 UTC

locked fate threads

I recently tried to remove some tables, during which I was getting a shell thread stuck on IO error.  A fate print plus some digging into the logs revealed they were stuck waiting on WAL resources.  I found a thread in which Eric Newton explained how to manually remove the tables removing lines from the !METADATA table using "deletemany -c file," then cleaning up the /accumulo/tables/<id> in hdfs.  I've done that, however the fate threads are still locked and I am unable to delete or fail them.  Additionally, the tables I removed from !METADATA and hdfs still appear in the list returned by the "tables" command in shell.  Below is the result of a "fate print."  To note, tables id a and b are the two which I've removed.



test@c4s> fate print

txid: 4136e024209602eb  status: IN_PROGRESS         op: ChangeTableState  locked: []              locking: [W:b]           top: ChangeTableState

txid: 439193592e93e230  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 1576dca47dfa2c65  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 3ee6232db200f2c7  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 19e5d3349679ff6e  status: IN_PROGRESS         op: TableRangeOp     locked: [W:a]           locking: []              top: TableRangeOpWait

txid: 29204be9d141dc88  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 7d07c50ceb5ac487  status: IN_PROGRESS         op: DeleteTable      locked: []              locking: [W:b]           top: DeleteTable

txid: 72895b4b1a5a1640  status: IN_PROGRESS         op: DeleteTable      locked: []              locking: [W:b]           top: DeleteTable

txid: 6902bcb06c4f5ae7  status: IN_PROGRESS         op: DeleteTable      locked: []              locking: [W:b]           top: DeleteTable

txid: 08db2316eb783ba1  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 6b0b135ca643b709  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 0e174c9af5092e54  status: IN_PROGRESS         op: TableRangeOp     locked: [W:b]           locking: []              top: TableRangeOpWait

12 transactions



Thanks in advance for your help.



losco

Re: [External] Re: locked fate threads

Posted by Eric Newton <er...@gmail.com>.

stop-all probably won't work.  I'm suggesting a cluster-wide kill of all
tablet servers:

$ pssh -h conf/slaves pkill -f =tserve[r]   # <--- requires parallel ssh to
be installed

On the master host:

$ pkill -f =master

Wait for the master lock to expire (typically 30 seconds), and kill all the
fate transactions:

$ ./bin/accumulo org.apache.accumulo.server.fate.Admin kill "<txid>"

Then do a start-all and cross your fingers. :-)

-Eric


On Thu, Sep 5, 2013 at 9:27 AM, Losco, Jason [USA] <Lo...@bah.com>wrote:

>  Thanks for the quick response.  I issued the command to take those
> offline, however, they were locked up due to the other threads so it didn’t
> take.  How do I go about deleting those fate transactions?  Fate delete and
> fate fail do not work from the shell.  Are you suggesting a stop-all of
> accumulo, then running something using the actual AdminUtil class to kill
> those transactions?  Any input into how to kick off that process would be
> greatly appreciated.****
>
> ** **
>
> losco****
>
> ** **
>
> *From:* Eric Newton [mailto:eric.newton@gmail.com]
> *Sent:* Thursday, September 05, 2013 9:18 AM
> *To:* user@accumulo.apache.org
> *Subject:* [External] Re: locked fate threads****
>
> ** **
>
> I can't believe I posted a note about using deletemany on the !METADATA
> table!  That was pretty reckless of me.****
>
> ** **
>
> If you really deleted your table data doing this, and your table was
> online at the time, you need to restart your cluster.****
>
> ** **
>
> That alone might fix the problem.  Otherwise, you are going to need to
> kill the master, delete the fate transactions, restart the master, and
> properly delete the tables.****
>
> ** **
>
> -Eric****
>
> ** **
>
> On Thu, Sep 5, 2013 at 8:00 AM, Losco, Jason [USA] <Lo...@bah.com>
> wrote:****
>
> I recently tried to remove some tables, during which I was getting a shell
> thread stuck on IO error.  A fate print plus some digging into the logs
> revealed they were stuck waiting on WAL resources.  I found a thread in
> which Eric Newton explained how to manually remove the tables removing
> lines from the !METADATA table using “deletemany –c file,” then cleaning up
> the /accumulo/tables/<id> in hdfs.  I’ve done that, however the fate
> threads are still locked and I am unable to delete or fail them.
> Additionally, the tables I removed from !METADATA and hdfs still appear in
> the list returned by the “tables” command in shell.  Below is the result of
> a “fate print.”  To note, tables id a and b are the two which I’ve removed.
> ****
>
>  ****
>
> test@c4s> fate print****
>
> txid: 4136e024209602eb  status: IN_PROGRESS         op: ChangeTableState
> locked: []              locking: [W:b]           top: ChangeTableState****
>
> txid: 439193592e93e230  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 1576dca47dfa2c65  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 3ee6232db200f2c7  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 19e5d3349679ff6e  status: IN_PROGRESS         op: TableRangeOp
> locked: [W:a]           locking: []              top: TableRangeOpWait****
>
> txid: 29204be9d141dc88  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 7d07c50ceb5ac487  status: IN_PROGRESS         op: DeleteTable
> locked: []              locking: [W:b]           top: DeleteTable****
>
> txid: 72895b4b1a5a1640  status: IN_PROGRESS         op: DeleteTable
> locked: []              locking: [W:b]           top: DeleteTable****
>
> txid: 6902bcb06c4f5ae7  status: IN_PROGRESS         op: DeleteTable
> locked: []              locking: [W:b]           top: DeleteTable****
>
> txid: 08db2316eb783ba1  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 6b0b135ca643b709  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 0e174c9af5092e54  status: IN_PROGRESS         op: TableRangeOp
> locked: [W:b]           locking: []              top: TableRangeOpWait****
>
> 12 transactions****
>
>  ****
>
> Thanks in advance for your help.****
>
>  ****
>
> losco****
>
>  ****
>
> ** **
>

RE: [External] Re: locked fate threads

Posted by "Losco, Jason [USA]" <Lo...@bah.com>.

Thanks for the quick response.  I issued the command to take those offline, however, they were locked up due to the other threads so it didn't take.  How do I go about deleting those fate transactions?  Fate delete and fate fail do not work from the shell.  Are you suggesting a stop-all of accumulo, then running something using the actual AdminUtil class to kill those transactions?  Any input into how to kick off that process would be greatly appreciated.



losco



From: Eric Newton [mailto:eric.newton@gmail.com]
Sent: Thursday, September 05, 2013 9:18 AM
To: user@accumulo.apache.org
Subject: [External] Re: locked fate threads



I can't believe I posted a note about using deletemany on the !METADATA table!  That was pretty reckless of me.



If you really deleted your table data doing this, and your table was online at the time, you need to restart your cluster.



That alone might fix the problem.  Otherwise, you are going to need to kill the master, delete the fate transactions, restart the master, and properly delete the tables.



-Eric



On Thu, Sep 5, 2013 at 8:00 AM, Losco, Jason [USA] <Lo...@bah.com>> wrote:

I recently tried to remove some tables, during which I was getting a shell thread stuck on IO error.  A fate print plus some digging into the logs revealed they were stuck waiting on WAL resources.  I found a thread in which Eric Newton explained how to manually remove the tables removing lines from the !METADATA table using "deletemany -c file," then cleaning up the /accumulo/tables/<id> in hdfs.  I've done that, however the fate threads are still locked and I am unable to delete or fail them.  Additionally, the tables I removed from !METADATA and hdfs still appear in the list returned by the "tables" command in shell.  Below is the result of a "fate print."  To note, tables id a and b are the two which I've removed.



test@c4s> fate print

txid: 4136e024209602eb  status: IN_PROGRESS         op: ChangeTableState  locked: []              locking: [W:b]           top: ChangeTableState

txid: 439193592e93e230  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 1576dca47dfa2c65  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 3ee6232db200f2c7  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 19e5d3349679ff6e  status: IN_PROGRESS         op: TableRangeOp     locked: [W:a]           locking: []              top: TableRangeOpWait

txid: 29204be9d141dc88  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 7d07c50ceb5ac487  status: IN_PROGRESS         op: DeleteTable      locked: []              locking: [W:b]           top: DeleteTable

txid: 72895b4b1a5a1640  status: IN_PROGRESS         op: DeleteTable      locked: []              locking: [W:b]           top: DeleteTable

txid: 6902bcb06c4f5ae7  status: IN_PROGRESS         op: DeleteTable      locked: []              locking: [W:b]           top: DeleteTable

txid: 08db2316eb783ba1  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 6b0b135ca643b709  status: IN_PROGRESS         op: TableRangeOp     locked: []              locking: [W:b]           top: TableRangeOp

txid: 0e174c9af5092e54  status: IN_PROGRESS         op: TableRangeOp     locked: [W:b]           locking: []              top: TableRangeOpWait

12 transactions



Thanks in advance for your help.



losco

Re: locked fate threads

Posted by Eric Newton <er...@gmail.com>.

I can't believe I posted a note about using deletemany on the !METADATA
table!  That was pretty reckless of me.

If you really deleted your table data doing this, and your table was online
at the time, you need to restart your cluster.

That alone might fix the problem.  Otherwise, you are going to need to kill
the master, delete the fate transactions, restart the master, and properly
delete the tables.

-Eric


On Thu, Sep 5, 2013 at 8:00 AM, Losco, Jason [USA] <Lo...@bah.com>wrote:

>  I recently tried to remove some tables, during which I was getting a
> shell thread stuck on IO error.  A fate print plus some digging into the
> logs revealed they were stuck waiting on WAL resources.  I found a thread
> in which Eric Newton explained how to manually remove the tables removing
> lines from the !METADATA table using “deletemany –c file,” then cleaning up
> the /accumulo/tables/<id> in hdfs.  I’ve done that, however the fate
> threads are still locked and I am unable to delete or fail them.
> Additionally, the tables I removed from !METADATA and hdfs still appear in
> the list returned by the “tables” command in shell.  Below is the result of
> a “fate print.”  To note, tables id a and b are the two which I’ve removed.
> ****
>
> ** **
>
> test@c4s> fate print****
>
> txid: 4136e024209602eb  status: IN_PROGRESS         op: ChangeTableState
> locked: []              locking: [W:b]           top: ChangeTableState****
>
> txid: 439193592e93e230  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 1576dca47dfa2c65  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 3ee6232db200f2c7  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 19e5d3349679ff6e  status: IN_PROGRESS         op: TableRangeOp
> locked: [W:a]           locking: []              top: TableRangeOpWait****
>
> txid: 29204be9d141dc88  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 7d07c50ceb5ac487  status: IN_PROGRESS         op: DeleteTable
> locked: []              locking: [W:b]           top: DeleteTable****
>
> txid: 72895b4b1a5a1640  status: IN_PROGRESS         op: DeleteTable
> locked: []              locking: [W:b]           top: DeleteTable****
>
> txid: 6902bcb06c4f5ae7  status: IN_PROGRESS         op: DeleteTable
> locked: []              locking: [W:b]           top: DeleteTable****
>
> txid: 08db2316eb783ba1  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 6b0b135ca643b709  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 0e174c9af5092e54  status: IN_PROGRESS         op: TableRangeOp
> locked: [W:b]           locking: []              top: TableRangeOpWait****
>
> 12 transactions****
>
> ** **
>
> Thanks in advance for your help.****
>
> ** **
>
> losco****
>
> ** **
>