You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Losco, Jason [USA]" <Lo...@bah.com> on 2013/09/05 14:00:26 UTC
locked fate threads
I recently tried to remove some tables, during which I was getting a shell thread stuck on IO error. A fate print plus some digging into the logs revealed they were stuck waiting on WAL resources. I found a thread in which Eric Newton explained how to manually remove the tables removing lines from the !METADATA table using "deletemany -c file," then cleaning up the /accumulo/tables/<id> in hdfs. I've done that, however the fate threads are still locked and I am unable to delete or fail them. Additionally, the tables I removed from !METADATA and hdfs still appear in the list returned by the "tables" command in shell. Below is the result of a "fate print." To note, tables id a and b are the two which I've removed.
test@c4s> fate print
txid: 4136e024209602eb status: IN_PROGRESS op: ChangeTableState locked: [] locking: [W:b] top: ChangeTableState
txid: 439193592e93e230 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 1576dca47dfa2c65 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 3ee6232db200f2c7 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 19e5d3349679ff6e status: IN_PROGRESS op: TableRangeOp locked: [W:a] locking: [] top: TableRangeOpWait
txid: 29204be9d141dc88 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 7d07c50ceb5ac487 status: IN_PROGRESS op: DeleteTable locked: [] locking: [W:b] top: DeleteTable
txid: 72895b4b1a5a1640 status: IN_PROGRESS op: DeleteTable locked: [] locking: [W:b] top: DeleteTable
txid: 6902bcb06c4f5ae7 status: IN_PROGRESS op: DeleteTable locked: [] locking: [W:b] top: DeleteTable
txid: 08db2316eb783ba1 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 6b0b135ca643b709 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 0e174c9af5092e54 status: IN_PROGRESS op: TableRangeOp locked: [W:b] locking: [] top: TableRangeOpWait
12 transactions
Thanks in advance for your help.
losco
Re: [External] Re: locked fate threads
Posted by Eric Newton <er...@gmail.com>.
stop-all probably won't work. I'm suggesting a cluster-wide kill of all
tablet servers:
$ pssh -h conf/slaves pkill -f =tserve[r] # <--- requires parallel ssh to
be installed
On the master host:
$ pkill -f =master
Wait for the master lock to expire (typically 30 seconds), and kill all the
fate transactions:
$ ./bin/accumulo org.apache.accumulo.server.fate.Admin kill "<txid>"
Then do a start-all and cross your fingers. :-)
-Eric
On Thu, Sep 5, 2013 at 9:27 AM, Losco, Jason [USA] <Lo...@bah.com>wrote:
> Thanks for the quick response. I issued the command to take those
> offline, however, they were locked up due to the other threads so it didn’t
> take. How do I go about deleting those fate transactions? Fate delete and
> fate fail do not work from the shell. Are you suggesting a stop-all of
> accumulo, then running something using the actual AdminUtil class to kill
> those transactions? Any input into how to kick off that process would be
> greatly appreciated.****
>
> ** **
>
> losco****
>
> ** **
>
> *From:* Eric Newton [mailto:eric.newton@gmail.com]
> *Sent:* Thursday, September 05, 2013 9:18 AM
> *To:* user@accumulo.apache.org
> *Subject:* [External] Re: locked fate threads****
>
> ** **
>
> I can't believe I posted a note about using deletemany on the !METADATA
> table! That was pretty reckless of me.****
>
> ** **
>
> If you really deleted your table data doing this, and your table was
> online at the time, you need to restart your cluster.****
>
> ** **
>
> That alone might fix the problem. Otherwise, you are going to need to
> kill the master, delete the fate transactions, restart the master, and
> properly delete the tables.****
>
> ** **
>
> -Eric****
>
> ** **
>
> On Thu, Sep 5, 2013 at 8:00 AM, Losco, Jason [USA] <Lo...@bah.com>
> wrote:****
>
> I recently tried to remove some tables, during which I was getting a shell
> thread stuck on IO error. A fate print plus some digging into the logs
> revealed they were stuck waiting on WAL resources. I found a thread in
> which Eric Newton explained how to manually remove the tables removing
> lines from the !METADATA table using “deletemany –c file,” then cleaning up
> the /accumulo/tables/<id> in hdfs. I’ve done that, however the fate
> threads are still locked and I am unable to delete or fail them.
> Additionally, the tables I removed from !METADATA and hdfs still appear in
> the list returned by the “tables” command in shell. Below is the result of
> a “fate print.” To note, tables id a and b are the two which I’ve removed.
> ****
>
> ****
>
> test@c4s> fate print****
>
> txid: 4136e024209602eb status: IN_PROGRESS op: ChangeTableState
> locked: [] locking: [W:b] top: ChangeTableState****
>
> txid: 439193592e93e230 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 1576dca47dfa2c65 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 3ee6232db200f2c7 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 19e5d3349679ff6e status: IN_PROGRESS op: TableRangeOp
> locked: [W:a] locking: [] top: TableRangeOpWait****
>
> txid: 29204be9d141dc88 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 7d07c50ceb5ac487 status: IN_PROGRESS op: DeleteTable
> locked: [] locking: [W:b] top: DeleteTable****
>
> txid: 72895b4b1a5a1640 status: IN_PROGRESS op: DeleteTable
> locked: [] locking: [W:b] top: DeleteTable****
>
> txid: 6902bcb06c4f5ae7 status: IN_PROGRESS op: DeleteTable
> locked: [] locking: [W:b] top: DeleteTable****
>
> txid: 08db2316eb783ba1 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 6b0b135ca643b709 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 0e174c9af5092e54 status: IN_PROGRESS op: TableRangeOp
> locked: [W:b] locking: [] top: TableRangeOpWait****
>
> 12 transactions****
>
> ****
>
> Thanks in advance for your help.****
>
> ****
>
> losco****
>
> ****
>
> ** **
>
RE: [External] Re: locked fate threads
Posted by "Losco, Jason [USA]" <Lo...@bah.com>.
Thanks for the quick response. I issued the command to take those offline, however, they were locked up due to the other threads so it didn't take. How do I go about deleting those fate transactions? Fate delete and fate fail do not work from the shell. Are you suggesting a stop-all of accumulo, then running something using the actual AdminUtil class to kill those transactions? Any input into how to kick off that process would be greatly appreciated.
losco
From: Eric Newton [mailto:eric.newton@gmail.com]
Sent: Thursday, September 05, 2013 9:18 AM
To: user@accumulo.apache.org
Subject: [External] Re: locked fate threads
I can't believe I posted a note about using deletemany on the !METADATA table! That was pretty reckless of me.
If you really deleted your table data doing this, and your table was online at the time, you need to restart your cluster.
That alone might fix the problem. Otherwise, you are going to need to kill the master, delete the fate transactions, restart the master, and properly delete the tables.
-Eric
On Thu, Sep 5, 2013 at 8:00 AM, Losco, Jason [USA] <Lo...@bah.com>> wrote:
I recently tried to remove some tables, during which I was getting a shell thread stuck on IO error. A fate print plus some digging into the logs revealed they were stuck waiting on WAL resources. I found a thread in which Eric Newton explained how to manually remove the tables removing lines from the !METADATA table using "deletemany -c file," then cleaning up the /accumulo/tables/<id> in hdfs. I've done that, however the fate threads are still locked and I am unable to delete or fail them. Additionally, the tables I removed from !METADATA and hdfs still appear in the list returned by the "tables" command in shell. Below is the result of a "fate print." To note, tables id a and b are the two which I've removed.
test@c4s> fate print
txid: 4136e024209602eb status: IN_PROGRESS op: ChangeTableState locked: [] locking: [W:b] top: ChangeTableState
txid: 439193592e93e230 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 1576dca47dfa2c65 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 3ee6232db200f2c7 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 19e5d3349679ff6e status: IN_PROGRESS op: TableRangeOp locked: [W:a] locking: [] top: TableRangeOpWait
txid: 29204be9d141dc88 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 7d07c50ceb5ac487 status: IN_PROGRESS op: DeleteTable locked: [] locking: [W:b] top: DeleteTable
txid: 72895b4b1a5a1640 status: IN_PROGRESS op: DeleteTable locked: [] locking: [W:b] top: DeleteTable
txid: 6902bcb06c4f5ae7 status: IN_PROGRESS op: DeleteTable locked: [] locking: [W:b] top: DeleteTable
txid: 08db2316eb783ba1 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 6b0b135ca643b709 status: IN_PROGRESS op: TableRangeOp locked: [] locking: [W:b] top: TableRangeOp
txid: 0e174c9af5092e54 status: IN_PROGRESS op: TableRangeOp locked: [W:b] locking: [] top: TableRangeOpWait
12 transactions
Thanks in advance for your help.
losco
Re: locked fate threads
Posted by Eric Newton <er...@gmail.com>.
I can't believe I posted a note about using deletemany on the !METADATA
table! That was pretty reckless of me.
If you really deleted your table data doing this, and your table was online
at the time, you need to restart your cluster.
That alone might fix the problem. Otherwise, you are going to need to kill
the master, delete the fate transactions, restart the master, and properly
delete the tables.
-Eric
On Thu, Sep 5, 2013 at 8:00 AM, Losco, Jason [USA] <Lo...@bah.com>wrote:
> I recently tried to remove some tables, during which I was getting a
> shell thread stuck on IO error. A fate print plus some digging into the
> logs revealed they were stuck waiting on WAL resources. I found a thread
> in which Eric Newton explained how to manually remove the tables removing
> lines from the !METADATA table using “deletemany –c file,” then cleaning up
> the /accumulo/tables/<id> in hdfs. I’ve done that, however the fate
> threads are still locked and I am unable to delete or fail them.
> Additionally, the tables I removed from !METADATA and hdfs still appear in
> the list returned by the “tables” command in shell. Below is the result of
> a “fate print.” To note, tables id a and b are the two which I’ve removed.
> ****
>
> ** **
>
> test@c4s> fate print****
>
> txid: 4136e024209602eb status: IN_PROGRESS op: ChangeTableState
> locked: [] locking: [W:b] top: ChangeTableState****
>
> txid: 439193592e93e230 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 1576dca47dfa2c65 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 3ee6232db200f2c7 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 19e5d3349679ff6e status: IN_PROGRESS op: TableRangeOp
> locked: [W:a] locking: [] top: TableRangeOpWait****
>
> txid: 29204be9d141dc88 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 7d07c50ceb5ac487 status: IN_PROGRESS op: DeleteTable
> locked: [] locking: [W:b] top: DeleteTable****
>
> txid: 72895b4b1a5a1640 status: IN_PROGRESS op: DeleteTable
> locked: [] locking: [W:b] top: DeleteTable****
>
> txid: 6902bcb06c4f5ae7 status: IN_PROGRESS op: DeleteTable
> locked: [] locking: [W:b] top: DeleteTable****
>
> txid: 08db2316eb783ba1 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 6b0b135ca643b709 status: IN_PROGRESS op: TableRangeOp
> locked: [] locking: [W:b] top: TableRangeOp****
>
> txid: 0e174c9af5092e54 status: IN_PROGRESS op: TableRangeOp
> locked: [W:b] locking: [] top: TableRangeOpWait****
>
> 12 transactions****
>
> ** **
>
> Thanks in advance for your help.****
>
> ** **
>
> losco****
>
> ** **
>