You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2015/08/12 02:48:45 UTC

[jira] [Comment Edited] (HBASE-14209) [0.98] TestShell visibility tests failing

    [ https://issues.apache.org/jira/browse/HBASE-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682445#comment-14682445 ] 

Andrew Purtell edited comment on HBASE-14209 at 8/12/15 12:48 AM:
------------------------------------------------------------------

So, the problem here is a missed ZooKeeper watch notification. I modified TestShell to set the default log level to DEBUG and can confirm we get all the way through to where the visibility label cache processes the client request and writes data to zookeeper, but then the expected cache update via watcher trigger doesn't happen. This seems specific to 0.98 because I've not seen this test failure with branch-1.0 or higher.

I'm attaching a patch for VisibilityLabelsCache that works around the problem by updating local state before writing it to zookeeper. It's a hack, but using ZK as notification bus for this is itself a hack slated to be replaced by proc v2. The patch has three hunks. The first two fix where we are not protecting 'ordinalVsLabels' with the RWlock and should be applied everywhere:

{code}
@@ -206,27 +206,37 @@ public class VisibilityLabelsCache implements VisibilityLabelOrdinalProvider {
   }
 
   public List<String> getUserAuths(String user) {
-    List<String> auths = EMPTY_LIST;
-    Set<Integer> authOrdinals = getUserAuthsAsOrdinals(user);
-    if (!authOrdinals.equals(EMPTY_SET)) {
-      auths = new ArrayList<String>(authOrdinals.size());
-      for (Integer authOrdinal : authOrdinals) {
-        auths.add(ordinalVsLabels.get(authOrdinal));
+    this.lock.readLock().lock();
+    try {
+      List<String> auths = EMPTY_LIST;
+      Set<Integer> authOrdinals = getUserAuthsAsOrdinals(user);
+      if (!authOrdinals.equals(EMPTY_SET)) {
+        auths = new ArrayList<String>(authOrdinals.size());
+        for (Integer authOrdinal : authOrdinals) {
+          auths.add(ordinalVsLabels.get(authOrdinal));
+        }
       }
+      return auths;
+    } finally {
+      this.lock.readLock().unlock();
     }
-    return auths;
   }
 
   public List<String> getGroupAuths(String[] groups) {
-    List<String> auths = EMPTY_LIST;
-    Set<Integer> authOrdinals = getGroupAuthsAsOrdinals(groups);
-    if (!authOrdinals.equals(EMPTY_SET)) {
-      auths = new ArrayList<String>(authOrdinals.size());
-      for (Integer authOrdinal : authOrdinals) {
-        auths.add(ordinalVsLabels.get(authOrdinal));
+    this.lock.readLock().lock();
+    try {
+      List<String> auths = EMPTY_LIST;
+      Set<Integer> authOrdinals = getGroupAuthsAsOrdinals(groups);
+      if (!authOrdinals.equals(EMPTY_SET)) {
+        auths = new ArrayList<String>(authOrdinals.size());
+        for (Integer authOrdinal : authOrdinals) {
+          auths.add(ordinalVsLabels.get(authOrdinal));
+        }
       }
+      return auths;
+    } finally {
+      this.lock.readLock().unlock();
     }
-    return auths;
   }
 
   /**
{code}

The third hunk is the test fix hack for 0.98 and doesn't necessarily need to be applied to later code lines. The effect is the local state may be updated twice - on one cluster node only, where the client call is processed - once when writeToZookeeper() is called, later again when processing a watcher notification. This can be expensive, but updates to this data are expected to be infrequent, and this is less serious than missing the watcher notification and failing to update the cached data.

{code}
@@ -270,7 +280,15 @@ public class VisibilityLabelsCache implements VisibilityLabelOrdinalProvider {
     }
   }
 
-  public void writeToZookeeper(byte[] data, boolean labelsOrUserAuths) {
+  public void writeToZookeeper(byte[] data, boolean labelsOrUserAuths) throws IOException {
+    // Update local state, then send it to zookeeper
+    if (labelsOrUserAuths) {
+      // True for labels
+      this.refreshLabelsCache(data);
+    } else {
+      // False for user auths
+      this.refreshUserAuthsCache(data);
+    }
     this.zkVisibilityWatcher.writeToZookeeper(data, labelsOrUserAuths);
   }
 }
{code}

What do you think [~anoop.hbase], [~ram_krish] ?



was (Author: apurtell):
So, the problem here is a missed ZooKeeper watch notification. I modified TestShell to set the default log level to DEBUG and can confirm we get all the way through to where the visibility label cache processes the client request and writes data to zookeeper, but then the expected cache update via watcher trigger doesn't happen. This seems specific to 0.98 because I've not seen this test failure with branch-1.0 or higher.

I'm attaching a patch for VisibilityLabelsCache that works around the problem by updating local state before writing it to zookeeper. It's a hack, but using ZK as notification bus for this is itself a hack slated to be replaced by proc v2. The patch has three hunks. The first two fix where we are not protecting 'ordinalVsLabels' with the RWlock and should be applied everywhere:

{code}
@@ -206,27 +206,37 @@ public class VisibilityLabelsCache implements VisibilityLabelOrdinalProvider {
   }
 
   public List<String> getUserAuths(String user) {
-    List<String> auths = EMPTY_LIST;
-    Set<Integer> authOrdinals = getUserAuthsAsOrdinals(user);
-    if (!authOrdinals.equals(EMPTY_SET)) {
-      auths = new ArrayList<String>(authOrdinals.size());
-      for (Integer authOrdinal : authOrdinals) {
-        auths.add(ordinalVsLabels.get(authOrdinal));
+    this.lock.readLock().lock();
+    try {
+      List<String> auths = EMPTY_LIST;
+      Set<Integer> authOrdinals = getUserAuthsAsOrdinals(user);
+      if (!authOrdinals.equals(EMPTY_SET)) {
+        auths = new ArrayList<String>(authOrdinals.size());
+        for (Integer authOrdinal : authOrdinals) {
+          auths.add(ordinalVsLabels.get(authOrdinal));
+        }
       }
+      return auths;
+    } finally {
+      this.lock.readLock().unlock();
     }
-    return auths;
   }
 
   public List<String> getGroupAuths(String[] groups) {
-    List<String> auths = EMPTY_LIST;
-    Set<Integer> authOrdinals = getGroupAuthsAsOrdinals(groups);
-    if (!authOrdinals.equals(EMPTY_SET)) {
-      auths = new ArrayList<String>(authOrdinals.size());
-      for (Integer authOrdinal : authOrdinals) {
-        auths.add(ordinalVsLabels.get(authOrdinal));
+    this.lock.readLock().lock();
+    try {
+      List<String> auths = EMPTY_LIST;
+      Set<Integer> authOrdinals = getGroupAuthsAsOrdinals(groups);
+      if (!authOrdinals.equals(EMPTY_SET)) {
+        auths = new ArrayList<String>(authOrdinals.size());
+        for (Integer authOrdinal : authOrdinals) {
+          auths.add(ordinalVsLabels.get(authOrdinal));
+        }
       }
+      return auths;
+    } finally {
+      this.lock.readLock().unlock();
     }
-    return auths;
   }
 
   /**
{code}

The third hunk is the test fix hack for 0.98 and doesn't necessarily need to be applied to later code lines. The effect is the local state may be updated twice, once when writeToZookeeper() is called, later again when processing a watcher notification. This can be expensive, but updates to this data are expected to be infrequent, and this is less serious than missing the watcher notification and failing to update the cached data.

{code}
@@ -270,7 +280,15 @@ public class VisibilityLabelsCache implements VisibilityLabelOrdinalProvider {
     }
   }
 
-  public void writeToZookeeper(byte[] data, boolean labelsOrUserAuths) {
+  public void writeToZookeeper(byte[] data, boolean labelsOrUserAuths) throws IOException {
+    // Update local state, then send it to zookeeper
+    if (labelsOrUserAuths) {
+      // True for labels
+      this.refreshLabelsCache(data);
+    } else {
+      // False for user auths
+      this.refreshUserAuthsCache(data);
+    }
     this.zkVisibilityWatcher.writeToZookeeper(data, labelsOrUserAuths);
   }
 }
{code}

What do you think [~anoop.hbase], [~ram_krish] ?


> [0.98] TestShell visibility tests failing
> -----------------------------------------
>
>                 Key: HBASE-14209
>                 URL: https://issues.apache.org/jira/browse/HBASE-14209
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.14
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 0.98.14
>
>         Attachments: HBASE-14209-0.98.patch
>
>
> This is after HBASE-14105  but we've seen this earlier where adding ruby units to the shell tests can cause the visibility tests to fail inexplicably. We can't just avoid adding ruby units to TestShell in 0.98 so figure out the root cause and fix or disable these. 
> {noformat}
>   1) Error:
> test_The_get/put_methods_should_work_for_data_written_with_Visibility(Hbase::VisibilityLabelsAdminMethodsTest):
> NativeException: junit.framework.AssertionFailedError: Waiting timed out after [10,000] msec
>     junit/framework/Assert.java:57:in `fail'
>     org/apache/hadoop/hbase/Waiter.java:193:in `waitFor'
>     org/apache/hadoop/hbase/Waiter.java:128:in `waitFor'
>     org/apache/hadoop/hbase/HBaseTestingUtility.java:3514:in `waitFor'
>     org/apache/hadoop/hbase/HBaseTestingUtility.java:3576:in `waitLabelAvailable'
>     ./src/test/ruby/hbase/visibility_labels_admin_test.rb:73:in `test_The_get/put_methods_should_work_for_data_written_with_Visibility'
>     org/jruby/RubyProc.java:270:in `call'
>     org/jruby/RubyKernel.java:2105:in `send'
>     org/jruby/RubyArray.java:1620:in `each'
>     org/jruby/RubyArray.java:1620:in `each'
>   2) Error:
> test_The_set/clear_methods_should_work_with_authorizations(Hbase::VisibilityLabelsAdminMethodsTest):
> NativeException: junit.framework.AssertionFailedError: Waiting timed out after [10,000] msec
>     junit/framework/Assert.java:57:in `fail'
>     org/apache/hadoop/hbase/Waiter.java:193:in `waitFor'
>     org/apache/hadoop/hbase/Waiter.java:128:in `waitFor'
>     org/apache/hadoop/hbase/HBaseTestingUtility.java:3514:in `waitFor'
>     org/apache/hadoop/hbase/HBaseTestingUtility.java:3576:in `waitLabelAvailable'
>     ./src/test/ruby/hbase/visibility_labels_admin_test.rb:52:in `test_The_set/clear_methods_should_work_with_authorizations'
>     org/jruby/RubyProc.java:270:in `call'
>     org/jruby/RubyKernel.java:2105:in `send'
>     org/jruby/RubyArray.java:1620:in `each'
>     org/jruby/RubyArray.java:1620:in `each'
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)