You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Eric Newton (JIRA)" <ji...@apache.org> on 2015/10/15 21:44:05 UTC
[jira] [Created] (ACCUMULO-4031) consistency check failure
Eric Newton created ACCUMULO-4031:
-------------------------------------
Summary: consistency check failure
Key: ACCUMULO-4031
URL: https://issues.apache.org/jira/browse/ACCUMULO-4031
Project: Accumulo
Issue Type: Bug
Components: tserver
Affects Versions: 1.6.4
Environment: Very large production cluster
Reporter: Eric Newton
Sorry for the lack of concrete details, but my logs are not online.
This system does a lot of bulk ingest. When it was shut down, a few tablets complained about inconsistency of their file list with what was in the metadata table.
I tracked one down, the others appear to be similar.
First, the inconsistency was an "extra" bulk import file in the metadata table, which was missing from the in-memory list.
The file was attempted to bulk loaded into the tablet, but the bulk-load failed. It failed due to a constraint violation: the bulk transaction was no longer running.
Except, really, it was. The constraint fired during the update of the tablets' metadata. The server of the metadata tablet was having a (brief) connection problem with zookeeper, which is where the bulk transaction status is stored.
The importing tablet server saw the constraint violation, and didn't add the file to the in-memory list. However, 2 minutes later, the bulk import was retried, and (consulting the metadata table), it claimed the file *was* imported already.
So, in the intervening 2 minutes, somehow the update was made to the metadata tablet.
* No splitting of either tablet occurred during this event.
* Neither tablet was moved during this event.
* No recovery of the metadata table took place.
* The tablet server never reported the file imported.
I reviewed the handling of constraints, and it looks correct (despite ACCUMULO-4029).
With ACCUMULO-3327, the tablet server will not reject retries, because it does not re-consult the metadata table.
I don't know how the mutations would be applied without the tablet server reporting the file as loaded.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)