You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Keith Turner (JIRA)" <ji...@apache.org> on 2014/08/29 22:19:53 UTC
[jira] [Assigned] (ACCUMULO-3096) Scans stuck and seeing error
message about contratint violation
[ https://issues.apache.org/jira/browse/ACCUMULO-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Keith Turner reassigned ACCUMULO-3096:
--------------------------------------
Assignee: Keith Turner
> Scans stuck and seeing error message about contratint violation
> ---------------------------------------------------------------
>
> Key: ACCUMULO-3096
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3096
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Keith Turner
> Assignee: Keith Turner
> Fix For: 1.6.1, 1.7.0
>
>
> Just helped someone debug an issue. Their scans were getting stuck on a certain tserver (determined tserver by turning on debug in shell). On the tserver, there was a contant stream of messages about a metadata table contstraint violate because {{Bulk load transaction no longer running}}.
> The following code in {{Tablet.importMapFiles()}}
> {code:java}
> synchronized (timeLock) {
> if (bulkTime > persistedTime)
> persistedTime = bulkTime;
> MetadataTableUtil.updateTabletDataFile(tid, extent, paths, tabletTime.getMetadataValue(persistedTime), creds, tabletServer.getLock());
> }
> {code}
> Ended up calling the following code in {{MetadataTableUtil}}.
> {code:java}
> public static void update(Credentials credentials, ZooLock zooLock, Mutation m, KeyExtent extent) {
> Writer t = extent.isMeta() ? getRootTable(credentials) : getMetadataTable(credentials);
> if (zooLock != null)
> putLockID(zooLock, m);
> while (true) {
> try {
> t.update(m);
> return;
> } catch (AccumuloException e) {
> log.error(e, e);
> } catch (AccumuloSecurityException e) {
> log.error(e, e);
> } catch (ConstraintViolationException e) {
> log.error(e, e);
> } catch (TableNotFoundException e) {
> log.error(e, e);
> }
> UtilWaitThread.sleep(1000);
> }
> }
> {code}
> So when the constraint failed, it retried forever. It did this while holding timeLock, which in turn prevented compactions from completing, which eventually gummed up scans.
--
This message was sent by Atlassian JIRA
(v6.2#6252)