You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Josh Elser (JIRA)" <ji...@apache.org> on 2016/08/03 17:57:20 UTC

[jira] [Created] (ACCUMULO-4398) Possible for client to see TableNotFoundException adding splits on a newly created table

Josh Elser created ACCUMULO-4398:
------------------------------------

             Summary: Possible for client to see TableNotFoundException adding splits on a newly created table
                 Key: ACCUMULO-4398
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4398
             Project: Accumulo
          Issue Type: Bug
          Components: client, zookeeper
            Reporter: Josh Elser


Just came across a really odd scenario. I believe that it's a race condition in the client that stems from our beloved {{ZooCache}}.

This was observed via a test failure in {{LogicalTimeIT}}:

{noformat}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 29.249 sec <<< FAILURE! - in org.apache.accumulo.test.functional.LogicalTimeIT
run(org.apache.accumulo.test.functional.LogicalTimeIT)  Time elapsed: 29.037 sec  <<< ERROR!
org.apache.accumulo.core.client.TableNotFoundException: Table LogicalTimeIT_run06 does not exist
	at org.apache.accumulo.core.client.impl.Tables._getTableId(Tables.java:117)
	at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:102)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.addSplits(TableOperationsImpl.java:374)
	at org.apache.accumulo.test.functional.LogicalTimeIT.runMergeTest(LogicalTimeIT.java:81)
	at org.apache.accumulo.test.functional.LogicalTimeIT.run(LogicalTimeIT.java:56)
{noformat}

Ultimately:

{code}
    conn.tableOperations().create(table, new NewTableConfiguration().setTimeType(TimeType.LOGICAL));
    TreeSet<Text> splitSet = new TreeSet<Text>();
    for (String split : splits) {
      splitSet.add(new Text(split));
    }
    conn.tableOperations().addSplits(table, splitSet);
{code}

The important piece to remember is that a ZooKeeper client, when a watcher is set, will eventually get all updates from that watcher in the order which they occurred. LogicalTimeIT is repeatedly running the same test over tables of varying characteristics. I think these are the important points.

Consider the following:

# Client creates a table T1
# ZooCache is cleared after FATE op finishes
# Watcher is set on ZTABLES in ZK
# Client interacts with T1
# Client creates T2
# ZooCache is cleared after FATE op finishes
# Watcher fires on ZTABLES node in ZK (CHILDREN_CHANGED) and repopulates the child list on the ZTABLES node
# Client makes call to split T2
# Code will check if the table exists, but the childrenCache will be repopulated in ZooCache which will cause the client to think the table doesn't exit
# Eventually, the watcher would fire and ZTABLES would be updated and everything is fine.

I believe this is a plausible scenario, however perhaps unlikely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)