You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "John Vines (JIRA)" <ji...@apache.org> on 2014/03/17 22:36:47 UTC
[jira] [Reopened] (ACCUMULO-2361) droptable created infinite
METADATA scan loop
[ https://issues.apache.org/jira/browse/ACCUMULO-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Vines reopened ACCUMULO-2361:
----------------------------------
So I'm seeing this again, I think. It is still with sqrrl extensions, but I do believe it's irrelevant. Critical items to note-
Repeatedly seeing {code}2014-03-17 11:13:50,784 [tabletserver.TabletServer] INFO : told to unload tablet that was not being served 3tx;^@^@�;^@^@�" {code}
Those are actually the same character-
{code}root@ip-10-10-1-20:/data0/logs/accumulo# tail tserver_ip-10-10-1-20.debug.log | grep -a "not being served" | tail -n 1 | hexdump -C
00000000 32 30 31 34 2d 30 33 2d 31 37 20 31 36 3a 34 36 |2014-03-17 16:46|
00000010 3a 35 35 2c 34 31 31 20 5b 74 61 62 6c 65 74 73 |:55,411 [tablets|
00000020 65 72 76 65 72 2e 54 61 62 6c 65 74 53 65 72 76 |erver.TabletServ|
00000030 65 72 5d 20 49 4e 46 4f 20 3a 20 74 6f 6c 64 20 |er] INFO : told |
00000040 74 6f 20 75 6e 6c 6f 61 64 20 74 61 62 6c 65 74 |to unload tablet|
00000050 20 74 68 61 74 20 77 61 73 20 6e 6f 74 20 62 65 | that was not be|
00000060 69 6e 67 20 73 65 72 76 65 64 20 33 74 78 3b 00 |ing served 3tx;.|
00000070 00 ef bf bd 3b 00 00 ef bf bd 0a |....;......|
0000007b
{code}
So it's trying to use a key extent with \x00\x00\xef\\xbf\bd as both the end row and prev end row, which is weird enough and MAY be relevent (side question - how is this happening?) and that there's no sign of this \x00\x00\xEF\xBF\xBD tablet in the !METADATA table. - metadata table view-
{code}
root@accumulo !METADATA> scan -b 3tx -e 3txa -c ~tab -t !METADATA -np
3tx;\x00\x00\x06f ~tab:~pr [] \x00
3tx;\x00\x00\x0C\xCC ~tab:~pr [] \x01\x00\x00\x06f
3tx;\x00\x00\x133 ~tab:~pr [] \x01\x00\x00\x0C\xCC
3tx;\x00\x00\x19\x99 ~tab:~pr [] \x01\x00\x00\x133
3tx;\x00\x00 ~tab:~pr [] \x01\x00\x00\x19\x99
3tx;\x00\x00&f ~tab:~pr [] \x01\x00\x00
3tx;\x00\x00,\xCC ~tab:~pr [] \x01\x00\x00&f
3tx;\x00\x0033 ~tab:~pr [] \x01\x00\x00,\xCC
3tx;\x00\x009\x99 ~tab:~pr [] \x01\x00\x0033
3tx;\x00\x00@ ~tab:~pr [] \x01\x00\x009\x99
3tx;\x00\x00Ff ~tab:~pr [] \x01\x00\x00@
3tx;\x00\x00L\xCC ~tab:~pr [] \x01\x00\x00Ff
3tx;\x00\x00S3 ~tab:~pr [] \x01\x00\x00L\xCC
3tx;\x00\x00Y\x99 ~tab:~pr [] \x01\x00\x00S3
3tx;\x00\x00` ~tab:~pr [] \x01\x00\x00Y\x99
3tx;\x00\x00ff ~tab:~pr [] \x01\x00\x00`
3tx;\x00\x00l\xCC ~tab:~pr [] \x01\x00\x00ff
3tx;\x00\x00s3 ~tab:~pr [] \x01\x00\x00l\xCC
3tx;\x00\x00y\x99 ~tab:~pr [] \x01\x00\x00s3
3tx;\x00\x00\x80 ~tab:~pr [] \x01\x00\x00y\x99
3tx;\x00\x00\x86f ~tab:~pr [] \x01\x00\x00\x80
3tx;\x00\x00\x8C\xCC ~tab:~pr [] \x01\x00\x00\x86f
3tx;\x00\x00\x933 ~tab:~pr [] \x01\x00\x00\x8C\xCC
3tx;\x00\x00\x99\x99 ~tab:~pr [] \x01\x00\x00\x933
3tx;\x00\x00\xA0 ~tab:~pr [] \x01\x00\x00\x99\x99
3tx;\x00\x00\xA6f ~tab:~pr [] \x01\x00\x00\xA0
3tx;\x00\x00\xAC\xCC ~tab:~pr [] \x01\x00\x00\xA6f
3tx;\x00\x00\xB33 ~tab:~pr [] \x01\x00\x00\xAC\xCC
3tx;\x00\x00\xB9\x99 ~tab:~pr [] \x01\x00\x00\xB33
3tx;\x00\x00\xC0 ~tab:~pr [] \x01\x00\x00\xB9\x99
3tx;\x00\x00\xC6f ~tab:~pr [] \x01\x00\x00\xC0
3tx;\x00\x00\xCC\xCC ~tab:~pr [] \x01\x00\x00\xC6f
3tx;\x00\x00\xD33 ~tab:~pr [] \x01\x00\x00\xCC\xCC
3tx;\x00\x00\xD9\x99 ~tab:~pr [] \x01\x00\x00\xD33
3tx;\x00\x00\xE0 ~tab:~pr [] \x01\x00\x00\xD9\x99
3tx;\x00\x00\xE6f ~tab:~pr [] \x01\x00\x00\xE0
3tx;\x00\x00\xEC\xCC ~tab:~pr [] \x01\x00\x00\xE6f
3tx;\x00\x00\xF9\x99 ~tab:~pr [] \x01\x00\x00\xEC\xCC
3tx;\x00\x01 ~tab:~pr [] \x01\x00\x00\xF9\x99
3tx< ~tab:~pr [] \x01\x00\x01
{code}
Tablet server logs-
{code}
2014-03-15 03:25:40,038 [tabletserver.TabletServer] INFO : unloaded 3tx;^@^@�;^@^@ٙ
2014-03-15 03:25:40,038 [tabletserver.Tablet] DEBUG: initiateClose(saveState=false queueMinC=false disableWrites=false) 3tx;^@^@�;^@^@�
2014-03-15 03:25:40,038 [tabletserver.Tablet] DEBUG: completeClose(saveState=false completeClose=true) 3tx;^@^@�;^@^@�
2014-03-15 03:25:40,038 [tabletserver.Tablet] TABLET_HIST: 3tx;^@^@�;^@^@� closed
2014-03-15 03:25:40,038 [tabletserver.TabletServer] DEBUG: Unassigning 3tx;^@^@�;^@^@�@(null,10.10.1.20:9997[1449daf9ff50ddc],null)
2014-03-15 03:25:40,042 [tabletserver.TabletServer] DEBUG: MultiScanSess 10.10.1.20:40511 2 entries in 0.00 secs (lookup_time:0.00 secs tablets:1 ranges:1)
2014-03-15 03:25:40,043 [tabletserver.TabletServer] INFO : unloaded 3tx;^@^@�;^@^@�
2014-03-15 03:25:40,043 [tabletserver.TabletServer] INFO : told to unload tablet that was not being served 3tx;^@^@�;^@^@�
2014-03-15 03:25:40,043 [tabletserver.TabletServer] INFO : told to unload tablet that was not being served 3tx;^@^@�;^@^@�
{code}
Looks like a race between unassigning and reattempting to unassign, which is something we've fought with in the past.
Bouncing the master does not resolve this.
table loc scan shows
{code}
root@accumulo> scan -b 3tx -e 3txa -c loc -t !METADATA
3tx;\x00\x00\xEC\xCC loc:1449daf9ff50ddc [] 10.10.1.20:9997
3tx;\x00\x00\xF9\x99 loc:1449daf9ff50ddc [] 10.10.1.20:9997
{code}
Which is strange, since we only see one erroring unassignment but two tablets with locations.
I then bounced the tserver and it resolved itself.
> droptable created infinite METADATA scan loop
> ---------------------------------------------
>
> Key: ACCUMULO-2361
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2361
> Project: Accumulo
> Issue Type: Bug
> Environment: accumulo-1.5.0 with sqrrl extensions.
> Reporter: Chris McCubbin
> Assignee: Eric Newton
> Attachments: Screen Shot 2014-02-12 at 2.27.11 PM.png, Screen Shot 2014-02-12 at 2.28.09 PM.png, jstack1.txt, jstack2.txt, jstack3.txt, masterJstack.txt, masterJstack2.txt, masterJstack3.txt, short.debug.log
>
>
> Working with [~vines] on this one...
> Setup: Created a couple tables, added some data, then dropped them. The drop hangs and !METADATA (which has ~400 entries) is scanned in what looks like an infinite loop.
> The table being dropped loks like this in !METADATA:
> {code}
> root@sqrrl> scan -b 3 -e 5 -t !METADATA
> 4;\x00\x00\x06f srv:dir [] /t-00000b6
> 4;\x00\x00\x06f srv:lock [] tservers/10.10.1.209:9997/zlock-0000000000$144274cd317000b
> 4;\x00\x00\x06f srv:time [] M0
> 4;\x00\x00\x06f ~tab:~pr [] \x00
> 4;\x00\x00\x0C\xCC loc:144274cd3170008 [] 10.10.1.107:9997
> 4;\x00\x00\x0C\xCC srv:dir [] /t-00000bj
> 4;\x00\x00\x0C\xCC srv:lock [] tservers/10.10.1.209:9997/zlock-0000000000$144274cd317000b
> 4;\x00\x00\x0C\xCC srv:time [] M0
> 4;\x00\x00\x0C\xCC ~tab:~pr [] \x01\x00\x00\x06f
> 4;\x00\x00\x133 srv:dir [] /t-000002h
> 4;\x00\x00\x133 srv:lock [] tservers/10.10.1.209:9997/zlock-0000000000$144274cd317000b
> 4;\x00\x00\x133 srv:time [] M0
> 4;\x00\x00\x133 ~tab:~pr [] \x01\x00\x00\x0C\xCC
> {code}
> We think this may be the relevant message in the master debug logs:
> {code}
> 2014-02-12 19:13:31,397 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,459 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,524 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,588 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,662 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,725 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,788 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,854 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,917 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> ...etc
> {code}
> Graceful accumulo reboot hangs.
> Hard reboot of everything (control-c'd) clears the problem.
--
This message was sent by Atlassian JIRA
(v6.2#6252)