You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by "Christopher Tubbs (JIRA)" <ji...@apache.org> on 2019/04/23 23:27:00 UTC

[jira] [Resolved] (ACCUMULO-4542) Tablet left in bad state after bulk import timeout

     [ https://issues.apache.org/jira/browse/ACCUMULO-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christopher Tubbs resolved ACCUMULO-4542.
-----------------------------------------
    Resolution: Cannot Reproduce

Can't reproduce, and this is OBE, with the new 2.0 bulk import API.

> Tablet left in bad state after bulk import timeout
> --------------------------------------------------
>
>                 Key: ACCUMULO-4542
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4542
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.7.2
>            Reporter: John Vines
>            Priority: Major
>
> On a cluster we saw a large amount of network issues at one point. Cause still has not been pinpointed, but it did result in us seeing a lot of rpc exceptions and the like.
> While these network issues happened, a bulk import was kicked off for a single file. This single file was assigned to two tablets (which both happened to be on the same server). Unfortunately, in the 3 attempts bulk import made to assign this file to this tablet, there were 3 rpc exceptions due to a socket timeout. After the three failures the bulk import went ahead and moved this file to the failures directory and carried on.
> Unfortunately, this file was actually assigned to the tablet succesfully on the first attempt. The following 2 attempts logged about how the server had already been assigned this file. It was shortly afterward a query came in (and then later major compactions) which then complained about how the file could not be found because the bulk import moved it to the failures directory.
> I think in this event we need some sort of final validation the record didn't end up in the metadata table before we move it to the failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)