You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "John Vines (JIRA)" <ji...@apache.org> on 2014/03/17 22:36:47 UTC

[jira] [Reopened] (ACCUMULO-2361) droptable created infinite METADATA scan loop

     [ https://issues.apache.org/jira/browse/ACCUMULO-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Vines reopened ACCUMULO-2361:
----------------------------------


So I'm seeing this again, I think. It is still with sqrrl extensions, but I do believe it's irrelevant. Critical items to note-

Repeatedly seeing {code}2014-03-17 11:13:50,784 [tabletserver.TabletServer] INFO : told to unload tablet that was not being served 3tx;^@^@�;^@^@�" {code}

Those are actually the same character-
{code}root@ip-10-10-1-20:/data0/logs/accumulo# tail tserver_ip-10-10-1-20.debug.log | grep -a "not being served" | tail -n 1 | hexdump -C
00000000  32 30 31 34 2d 30 33 2d  31 37 20 31 36 3a 34 36  |2014-03-17 16:46|
00000010  3a 35 35 2c 34 31 31 20  5b 74 61 62 6c 65 74 73  |:55,411 [tablets|
00000020  65 72 76 65 72 2e 54 61  62 6c 65 74 53 65 72 76  |erver.TabletServ|
00000030  65 72 5d 20 49 4e 46 4f  20 3a 20 74 6f 6c 64 20  |er] INFO : told |
00000040  74 6f 20 75 6e 6c 6f 61  64 20 74 61 62 6c 65 74  |to unload tablet|
00000050  20 74 68 61 74 20 77 61  73 20 6e 6f 74 20 62 65  | that was not be|
00000060  69 6e 67 20 73 65 72 76  65 64 20 33 74 78 3b 00  |ing served 3tx;.|
00000070  00 ef bf bd 3b 00 00 ef  bf bd 0a                 |....;......|
0000007b
{code}

So it's trying to use a key extent with \x00\x00\xef\\xbf\bd as both the end row and prev end row, which is weird enough and MAY be relevent (side question - how is this happening?) and that there's no sign of this \x00\x00\xEF\xBF\xBD tablet in the !METADATA table. - metadata table view-

{code}
root@accumulo !METADATA> scan -b 3tx -e 3txa -c ~tab -t !METADATA -np
3tx;\x00\x00\x06f ~tab:~pr []    \x00
3tx;\x00\x00\x0C\xCC ~tab:~pr []    \x01\x00\x00\x06f
3tx;\x00\x00\x133 ~tab:~pr []    \x01\x00\x00\x0C\xCC
3tx;\x00\x00\x19\x99 ~tab:~pr []    \x01\x00\x00\x133
3tx;\x00\x00  ~tab:~pr []    \x01\x00\x00\x19\x99
3tx;\x00\x00&f ~tab:~pr []    \x01\x00\x00 
3tx;\x00\x00,\xCC ~tab:~pr []    \x01\x00\x00&f
3tx;\x00\x0033 ~tab:~pr []    \x01\x00\x00,\xCC
3tx;\x00\x009\x99 ~tab:~pr []    \x01\x00\x0033
3tx;\x00\x00@ ~tab:~pr []    \x01\x00\x009\x99
3tx;\x00\x00Ff ~tab:~pr []    \x01\x00\x00@
3tx;\x00\x00L\xCC ~tab:~pr []    \x01\x00\x00Ff
3tx;\x00\x00S3 ~tab:~pr []    \x01\x00\x00L\xCC
3tx;\x00\x00Y\x99 ~tab:~pr []    \x01\x00\x00S3
3tx;\x00\x00` ~tab:~pr []    \x01\x00\x00Y\x99
3tx;\x00\x00ff ~tab:~pr []    \x01\x00\x00`
3tx;\x00\x00l\xCC ~tab:~pr []    \x01\x00\x00ff
3tx;\x00\x00s3 ~tab:~pr []    \x01\x00\x00l\xCC
3tx;\x00\x00y\x99 ~tab:~pr []    \x01\x00\x00s3
3tx;\x00\x00\x80 ~tab:~pr []    \x01\x00\x00y\x99
3tx;\x00\x00\x86f ~tab:~pr []    \x01\x00\x00\x80
3tx;\x00\x00\x8C\xCC ~tab:~pr []    \x01\x00\x00\x86f
3tx;\x00\x00\x933 ~tab:~pr []    \x01\x00\x00\x8C\xCC
3tx;\x00\x00\x99\x99 ~tab:~pr []    \x01\x00\x00\x933
3tx;\x00\x00\xA0 ~tab:~pr []    \x01\x00\x00\x99\x99
3tx;\x00\x00\xA6f ~tab:~pr []    \x01\x00\x00\xA0
3tx;\x00\x00\xAC\xCC ~tab:~pr []    \x01\x00\x00\xA6f
3tx;\x00\x00\xB33 ~tab:~pr []    \x01\x00\x00\xAC\xCC
3tx;\x00\x00\xB9\x99 ~tab:~pr []    \x01\x00\x00\xB33
3tx;\x00\x00\xC0 ~tab:~pr []    \x01\x00\x00\xB9\x99
3tx;\x00\x00\xC6f ~tab:~pr []    \x01\x00\x00\xC0
3tx;\x00\x00\xCC\xCC ~tab:~pr []    \x01\x00\x00\xC6f
3tx;\x00\x00\xD33 ~tab:~pr []    \x01\x00\x00\xCC\xCC
3tx;\x00\x00\xD9\x99 ~tab:~pr []    \x01\x00\x00\xD33
3tx;\x00\x00\xE0 ~tab:~pr []    \x01\x00\x00\xD9\x99
3tx;\x00\x00\xE6f ~tab:~pr []    \x01\x00\x00\xE0
3tx;\x00\x00\xEC\xCC ~tab:~pr []    \x01\x00\x00\xE6f
3tx;\x00\x00\xF9\x99 ~tab:~pr []    \x01\x00\x00\xEC\xCC
3tx;\x00\x01 ~tab:~pr []    \x01\x00\x00\xF9\x99
3tx< ~tab:~pr []    \x01\x00\x01
{code}

Tablet server logs-
{code}
2014-03-15 03:25:40,038 [tabletserver.TabletServer] INFO : unloaded 3tx;^@^@�;^@^@ٙ
2014-03-15 03:25:40,038 [tabletserver.Tablet] DEBUG: initiateClose(saveState=false queueMinC=false disableWrites=false) 3tx;^@^@�;^@^@�
2014-03-15 03:25:40,038 [tabletserver.Tablet] DEBUG: completeClose(saveState=false completeClose=true) 3tx;^@^@�;^@^@�
2014-03-15 03:25:40,038 [tabletserver.Tablet] TABLET_HIST: 3tx;^@^@�;^@^@� closed
2014-03-15 03:25:40,038 [tabletserver.TabletServer] DEBUG: Unassigning 3tx;^@^@�;^@^@�@(null,10.10.1.20:9997[1449daf9ff50ddc],null)
2014-03-15 03:25:40,042 [tabletserver.TabletServer] DEBUG: MultiScanSess 10.10.1.20:40511 2 entries in 0.00 secs (lookup_time:0.00 secs tablets:1 ranges:1) 
2014-03-15 03:25:40,043 [tabletserver.TabletServer] INFO : unloaded 3tx;^@^@�;^@^@�
2014-03-15 03:25:40,043 [tabletserver.TabletServer] INFO : told to unload tablet that was not being served 3tx;^@^@�;^@^@�
2014-03-15 03:25:40,043 [tabletserver.TabletServer] INFO : told to unload tablet that was not being served 3tx;^@^@�;^@^@�
{code}

Looks like a race between unassigning and reattempting to unassign, which is something we've fought with in the past.

Bouncing the master does not resolve this.

table loc scan shows
{code}
root@accumulo> scan -b 3tx -e 3txa -c loc -t !METADATA
3tx;\x00\x00\xEC\xCC loc:1449daf9ff50ddc []    10.10.1.20:9997
3tx;\x00\x00\xF9\x99 loc:1449daf9ff50ddc []    10.10.1.20:9997
{code}

Which is strange, since we only see one erroring unassignment but two tablets with locations.

I then bounced the tserver and it resolved itself.


> droptable created infinite METADATA scan loop
> ---------------------------------------------
>
>                 Key: ACCUMULO-2361
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2361
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: accumulo-1.5.0 with sqrrl extensions.
>            Reporter: Chris McCubbin
>            Assignee: Eric Newton
>         Attachments: Screen Shot 2014-02-12 at 2.27.11 PM.png, Screen Shot 2014-02-12 at 2.28.09 PM.png, jstack1.txt, jstack2.txt, jstack3.txt, masterJstack.txt, masterJstack2.txt, masterJstack3.txt, short.debug.log
>
>
> Working with [~vines] on this one...
> Setup: Created a couple tables, added some data, then dropped them. The drop hangs and !METADATA (which has ~400 entries) is scanned in what looks like an infinite loop.
> The table being dropped loks like this in !METADATA:
> {code}
> root@sqrrl> scan -b 3 -e 5 -t !METADATA
> 4;\x00\x00\x06f srv:dir []    /t-00000b6
> 4;\x00\x00\x06f srv:lock []    tservers/10.10.1.209:9997/zlock-0000000000$144274cd317000b
> 4;\x00\x00\x06f srv:time []    M0
> 4;\x00\x00\x06f ~tab:~pr []    \x00
> 4;\x00\x00\x0C\xCC loc:144274cd3170008 []    10.10.1.107:9997
> 4;\x00\x00\x0C\xCC srv:dir []    /t-00000bj
> 4;\x00\x00\x0C\xCC srv:lock []    tservers/10.10.1.209:9997/zlock-0000000000$144274cd317000b
> 4;\x00\x00\x0C\xCC srv:time []    M0
> 4;\x00\x00\x0C\xCC ~tab:~pr []    \x01\x00\x00\x06f
> 4;\x00\x00\x133 srv:dir []    /t-000002h
> 4;\x00\x00\x133 srv:lock []    tservers/10.10.1.209:9997/zlock-0000000000$144274cd317000b
> 4;\x00\x00\x133 srv:time []    M0
> 4;\x00\x00\x133 ~tab:~pr []    \x01\x00\x00\x0C\xCC
> {code}
> We think this may be the relevant message in the master debug logs:
> {code}
> 2014-02-12 19:13:31,397 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,459 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,524 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,588 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,662 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,725 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,788 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,854 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> 2014-02-12 19:13:31,917 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4 saw inconsistencynull 4;^@^@^L?;^@^@^Ff
> ...etc
> {code}
> Graceful accumulo reboot hangs. 
> Hard reboot of everything (control-c'd) clears the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)