You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/05/26 14:31:10 UTC

[GitHub] [accumulo] dlmarion opened a new issue, #2739: Broken or Flaky test: PermissionsIT

dlmarion opened a new issue, #2739:
URL: https://github.com/apache/accumulo/issues/2739

   PermissionsIT test methods are failing after merging #2707 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1142382048

   > https://github.com/apache/accumulo/pull/2740 isn't trying to solve any kind of global consistency across clients.
   
   Understood.
   
   > It's a much more modest solution: just do a better job of returning something at least as recent as any previous setProperty() operations that client might have issued.
   
   I think that will only happen if:
       * The client communicates with the same Accumulo server, or
       * The Accumulo server(s) are communicating with the same ZK server, or
       * All of the ZK servers have a consistent view of the information.
   
   > If we wanted an extra guarantee, we'd also zk.sync before re-reading, but clearing the cache to force it to read from ZK is much more stable than what's happening now.
   
   I agree. It should definitely help for the tests and should help in production as config changes are likely rare occurrences. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
EdColeman commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1139176446

   If the tests are setting multiple properties, it would help if the were bundled into an atomic set - that reduces the number of watch notifications that potentially need to be handled.
   
   With the way ZooKeeper events are propagated clients receive notifications asynchronously - they will eventuality be consistent (baring additional changes) 
   
   Possible architectural changes for mitigation:
     - send the properties with the scan request.
     - send a version id of the properties to use (this might require keeping multiple, tagged versions so that a specified version is always available)
     - send a timestamp that requests the properties have the same (or newer) timestamp. You could still end up with conflicts, but they would be more explicitly bound.
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1142041321

   If I'm remembering the problem correctly from last week, the client modifies some properties and the server watcher has not fired before the client fetches the property to verify that it's been set. Have we considered having `ZooPropStore.get(PropCacheKey)` call `ZooPropStore.readPropsFromZK(PropCacheKey)` to confirm that the cached version matches the version in ZK?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii closed issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
ctubbsii closed issue #2739: Broken or Flaky test: PermissionsIT
URL: https://github.com/apache/accumulo/issues/2739


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1139034972

   > These tests could be improved by looping, waiting for the expected change (relying on timing out to fail the test if the asserted condition never happens, rather than an assert statement)
   
   This could be made easier if we put the version number into the configuration. Then, the client could loop until a version of the properties come back with a version number greater than it had when it changed the configuration. For example:
   
   ```
   long version = client.tableOperations().getProperties(tableName).get(Property.CONFIGURATION_VERSION.getKey());
   // set iterator on table
   while (client.tableOperations().getProperties(tableName).get(Property.CONFIGURATION_VERSION.getKey() <= version) {
     Thread.sleep(250);
   }
   // move on
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1139144138

   That's one way to do it, without any architecture changes. However, it really only helps for tests, and we'd have to modify each test to do this. There could still be a problem with collisions with this solution, and there are also still situations where you see that it is updated on the one server you asked... but then you perform an operation (like a scan), and the tserver doing that operation still sees the old version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1144175536

   While not a 100% fix for the underlying issue, #2740 should fix the test instabilities we've seen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
EdColeman commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1142202800

   Does it have an impact that configuration / properties are relatively stable?   Once set, they generally do not change often. The tests are rather synthetic that they set and then immediately expect to see the changes.  In a system, those changes are going to to take time to propagate and the design should recognize that ambiguity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1138658455

   This may not be related to #2707 after all, it may be related to the ZooKeeper property changes. The test sets some arbitrary table properties then confirms that the counts match. The test does not fail all the time. The stack trace is:
   
   ```
   org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
   	at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
   	at org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
   	at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
   	at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
   	at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:628)
   	at org.apache.accumulo.test.functional.PermissionsIT.testArbitraryProperty(PermissionsIT.java:877)
   	at org.apache.accumulo.test.functional.PermissionsIT.testGrantedTablePermission(PermissionsIT.java:804)
   	at org.apache.accumulo.test.functional.PermissionsIT.tablePermissionTest(PermissionsIT.java:641)
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1142158068

   I went back and re-read the ZooKeeper [Consistency Guarantees](https://zookeeper.apache.org/doc/r3.8.0/zookeeperProgrammers.html#ch_zkGuarantees). I think the note at the end means that my suggestion wont work. I think it also means that #2740 and #2741 wont work either in the case where there is more than one ZK node. #2740 and #2741 may make the tests work, but it will be because of a side-effect of the number of ZK servers that MAC starts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1142099689

   > If I'm remembering the problem correctly from last week, the client modifies some properties and the server watcher has not fired before the client fetches the property to verify that it's been set. Have we considered having `ZooPropStore.get(PropCacheKey)` call `ZooPropStore.readPropsFromZK(PropCacheKey)` to confirm that the cached version matches the version in ZK?
   
   That's basically equivalent to expiring the cache, which forces it to reload from ZK, which is what PR #2740 (and #2741) do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1142230246

   I agree, I think this is an issue that likely only affects the tests. Can we specify a different ZooPropStore for MAC? Maybe a subclass of ZooPropStore that removes the cached value and reloads it as part of the get() call?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1142309419

   > I agree, I think this is an issue that likely only affects the tests. Can we specify a different ZooPropStore for MAC? Maybe a subclass of ZooPropStore that removes the cached value and reloads it as part of the get() call?
   
   I don't think it's a good idea to run tests with a different prop store. That will cause us to lose testing coverage for the code that users will use.
   
   > I went back and re-read the ZooKeeper [Consistency Guarantees](https://zookeeper.apache.org/doc/r3.8.0/zookeeperProgrammers.html#ch_zkGuarantees). I think the note at the end means that my suggestion wont work. I think it also means that #2740 and #2741 wont work either in the case where there is more than one ZK node. #2740 and #2741 may make the tests work, but it will be because of a side-effect of the number of ZK servers that MAC starts.
   
   Those aren't just for fixing tests... they are any time the user asks our API for the configuration. This applies to tests, but users can do this also. #2740 isn't trying to solve any kind of global consistency across clients. It's a much more modest solution: just do a better job of returning something at least as recent as any previous setProperty() operations that client might have issued. If we wanted an extra guarantee, we'd also zk.sync before re-reading, but clearing the cache to force it to read from ZK is much more stable than what's happening now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1142109382

   > That's basically equivalent to expiring the cache, which forces it to reload from ZK, which is what PR https://github.com/apache/accumulo/pull/2740 (and https://github.com/apache/accumulo/pull/2741) do.
   
   I was just wondering if we should attempt to make the ZK clients (Accumulo server processes) more consistent. I know there is a cost to it, but if the server processes were more consistent then we would not need to invalidate the cache on a client call. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1138911335

   I looked at several recently flaky tests... and they all seem to be the same type: do something that changes a property in ZK, then check that it was set to verify. I think what's happening is that the new prop store cache lookup is much faster now than with ZooCache previously, which looked up one property at a time. Because the cache lookup is faster, it's likely much faster than the watcher that triggers an update to the cache, and we're more likely to see stale values.
   
   These tests could be improved by looping, waiting for the expected change (relying on timing out to fail the test if the asserted condition never happens, rather than an assert statement), or we can do something to force a sync on the server side request handling when users have a client request to view properties, so the client sees recent edits. I'd probably prefer the latter, so that way it improves the user experience, and we don't have to chase these tests one by one.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on issue #2739: Broken or Flaky test: PermissionsIT

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2739:
URL: https://github.com/apache/accumulo/issues/2739#issuecomment-1139247549

   > If the tests are setting multiple properties, it would help if the were bundled into an atomic set - that reduces the number of watch notifications that potentially need to be handled.
   
   That will be easier after #2717 is done
   
   > 
   > With the way ZooKeeper events are propagated clients receive notifications asynchronously - they will eventuality be consistent (baring additional changes)
   > 
   > Possible architectural changes for mitigation:
   > 
   > * send the properties with the scan request.
   > * send a version id of the properties to use (this might require keeping multiple, tagged versions so that a specified version is always available)
   
   It might be nice to pass a generic metadata structure over every RPC call, so we can pass arbitrary state information to servers. We currently only send trace information that way, but we could expand that to include information about the caller's view of the configuration(s). The server could check to make sure that it is always at least as new as that version, and expire any cached items that are older, before proceeding. I wouldn't want to keep multiple versions, though. It should be fine as long as the server's view is at least as up-to-date as the client's. We don't need full isolated configuration... we just need to order client requests that use properties after processing any previous client requests to update properties.
   
   We would also need to have the property setter RPC methods return a version once they are set, so the client knows what version they just set.
   
   > * send a timestamp that requests the properties have the same (or newer) timestamp. You could still end up with conflicts, but they would be more explicitly bound.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org