You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2022/04/05 13:57:55 UTC

[GitHub] [zookeeper] kezhuw opened a new pull request, #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

kezhuw opened a new pull request, #1852:
URL: https://github.com/apache/zookeeper/pull/1852

   This test writes txns to trigger snapshot and expects some txns remain
   in txn log. But snapshot taking is asynchronous, thus all txns could be
   written to snapshot. So in restarting, it is possible that no txns to
   load after snapshot restored. This will fail assertion.
   
   This commit solves this by disable automic snapshot taking by
   `SyncRequestProcessor.setSnapCount(Integer.MAX_VALUE)`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] kezhuw commented on pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
kezhuw commented on PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#issuecomment-1162954259

   Ci fails due to unrelated flaky test [ZOOKEEPER-4308](https://issues.apache.org/jira/browse/ZOOKEEPER-4308)(#1851).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] kezhuw commented on a diff in pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
kezhuw commented on code in PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#discussion_r843376225


##########
zookeeper-server/src/test/java/org/apache/zookeeper/server/persistence/FileTxnSnapLogMetricsTest.java:
##########
@@ -43,38 +40,20 @@
 
     private static final Logger LOG = LoggerFactory.getLogger(FileTxnSnapLogMetricsTest.class);
 
-    CountDownLatch allCreatedLatch;
-
-    private class MockWatcher implements Watcher {
-
-        @Override
-        public void process(WatchedEvent e) {
-            LOG.info("all nodes created");
-            allCreatedLatch.countDown();
-        }
-
-    }
-
     @Test
     public void testFileTxnSnapLogMetrics() throws Exception {
-        SyncRequestProcessor.setSnapCount(100);
+        // disable automatic snapshot taking to leave writes in txn log
+        SyncRequestProcessor.setSnapCount(Integer.MAX_VALUE);
 

Review Comment:
   Not sure, but I think almost all tests run and should run in dedicated jvm(aka. false `reuseForks` in `zookeeper-server` module). This test is not an exception. Restoring a primitive value has no means in this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] kezhuw commented on a diff in pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
kezhuw commented on code in PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#discussion_r843373366


##########
zookeeper-server/src/test/java/org/apache/zookeeper/server/persistence/FileTxnSnapLogMetricsTest.java:
##########
@@ -43,38 +40,20 @@
 
     private static final Logger LOG = LoggerFactory.getLogger(FileTxnSnapLogMetricsTest.class);
 
-    CountDownLatch allCreatedLatch;
-
-    private class MockWatcher implements Watcher {
-
-        @Override
-        public void process(WatchedEvent e) {
-            LOG.info("all nodes created");
-            allCreatedLatch.countDown();
-        }
-
-    }
-
     @Test
     public void testFileTxnSnapLogMetrics() throws Exception {
-        SyncRequestProcessor.setSnapCount(100);
+        // disable automatic snapshot taking to leave writes in txn log
+        SyncRequestProcessor.setSnapCount(Integer.MAX_VALUE);
 
         QuorumUtil util = new QuorumUtil(1);
         util.startAll();
 
-        allCreatedLatch = new CountDownLatch(1);
-
         byte[] data = new byte[500];
-        // make sure a snapshot is taken and some txns are not in a snapshot
         ZooKeeper zk = ClientBase.createZKClient(util.getConnString());
         for (int i = 0; i < 150; i++) {
             zk.create("/path" + i, data, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
         }
 
-        if (null == zk.exists("/path149", new MockWatcher())) {

Review Comment:
   `zk.create` above is synchronous. `zk.exists` will return non null stat if I understand correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] symat commented on a diff in pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
symat commented on code in PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#discussion_r901700884


##########
zookeeper-server/src/test/java/org/apache/zookeeper/server/persistence/FileTxnSnapLogMetricsTest.java:
##########
@@ -43,38 +40,20 @@
 
     private static final Logger LOG = LoggerFactory.getLogger(FileTxnSnapLogMetricsTest.class);
 
-    CountDownLatch allCreatedLatch;
-
-    private class MockWatcher implements Watcher {
-
-        @Override
-        public void process(WatchedEvent e) {
-            LOG.info("all nodes created");
-            allCreatedLatch.countDown();
-        }
-
-    }
-
     @Test
     public void testFileTxnSnapLogMetrics() throws Exception {
-        SyncRequestProcessor.setSnapCount(100);
+        // disable automatic snapshot taking to leave writes in txn log
+        SyncRequestProcessor.setSnapCount(Integer.MAX_VALUE);
 

Review Comment:
   I'm not exactly sure how we run the tests... sometimes I even run them in IntelliJ during development. I think usually setting back all "static stuff" we changed in a test is a good practice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] kezhuw commented on a diff in pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
kezhuw commented on code in PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#discussion_r902455844


##########
zookeeper-server/src/test/java/org/apache/zookeeper/server/persistence/FileTxnSnapLogMetricsTest.java:
##########
@@ -43,38 +40,20 @@
 
     private static final Logger LOG = LoggerFactory.getLogger(FileTxnSnapLogMetricsTest.class);
 
-    CountDownLatch allCreatedLatch;
-
-    private class MockWatcher implements Watcher {
-
-        @Override
-        public void process(WatchedEvent e) {
-            LOG.info("all nodes created");
-            allCreatedLatch.countDown();
-        }
-
-    }
-
     @Test
     public void testFileTxnSnapLogMetrics() throws Exception {
-        SyncRequestProcessor.setSnapCount(100);
+        // disable automatic snapshot taking to leave writes in txn log
+        SyncRequestProcessor.setSnapCount(Integer.MAX_VALUE);
 

Review Comment:
   Cleanup is always a complicated problem. No ones run in identical environment. 
   
   > setting back all "static stuff" we changed in a test is a good practice.
   
   It is actually hard to follow. Given `SyncRequestProcessor.setSnapCount` as an example, there are 20+ usages in test, only `RecoveryTest.testRecovery` restores.
   
   Personally, I treat all tests as oneshot and resort to junit's `@Rule`(eg. `@TempDir`) or similars to cleanup persistent resources.
   
   Anyway, I will add a `@AfterEach` to restore it back in next fixup. 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] symat commented on pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
symat commented on PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#issuecomment-1164039781

   I really like your solution @kezhuw - you improved this test a lot. Thanks!
   
   I think we are good to merge this. We should have the flaky test fixed in older branches too, if the patch applies smoothly. I'm a bit overloaded today. @eolivelli feel free to merge this. Otherwise I'll try to do it tomorrow. Thanks!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] asfgit closed pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics
URL: https://github.com/apache/zookeeper/pull/1852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] eolivelli commented on a diff in pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
eolivelli commented on code in PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#discussion_r843073938


##########
zookeeper-server/src/test/java/org/apache/zookeeper/server/persistence/FileTxnSnapLogMetricsTest.java:
##########
@@ -43,38 +40,20 @@
 
     private static final Logger LOG = LoggerFactory.getLogger(FileTxnSnapLogMetricsTest.class);
 
-    CountDownLatch allCreatedLatch;
-
-    private class MockWatcher implements Watcher {
-
-        @Override
-        public void process(WatchedEvent e) {
-            LOG.info("all nodes created");
-            allCreatedLatch.countDown();
-        }
-
-    }
-
     @Test
     public void testFileTxnSnapLogMetrics() throws Exception {
-        SyncRequestProcessor.setSnapCount(100);
+        // disable automatic snapshot taking to leave writes in txn log
+        SyncRequestProcessor.setSnapCount(Integer.MAX_VALUE);
 
         QuorumUtil util = new QuorumUtil(1);
         util.startAll();
 
-        allCreatedLatch = new CountDownLatch(1);
-
         byte[] data = new byte[500];
-        // make sure a snapshot is taken and some txns are not in a snapshot
         ZooKeeper zk = ClientBase.createZKClient(util.getConnString());
         for (int i = 0; i < 150; i++) {
             zk.create("/path" + i, data, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
         }
 
-        if (null == zk.exists("/path149", new MockWatcher())) {

Review Comment:
   Why are we removing this part?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] kezhuw commented on pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
kezhuw commented on PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#issuecomment-1161553240

   @symat I think you are right about what the test try to express. But the original test does not assert that "we have a snapshot"(aka. "cnt_snapshottime"). That is why this fix pass. I think we could utilize `ZKTestCase.waitFor` to wait a snapshot and `FileSnap.close` to logs more txns after snapshot paused. I will push a fixup commit soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] eolivelli commented on a diff in pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
eolivelli commented on code in PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#discussion_r843073437


##########
zookeeper-server/src/test/java/org/apache/zookeeper/server/persistence/FileTxnSnapLogMetricsTest.java:
##########
@@ -43,38 +40,20 @@
 
     private static final Logger LOG = LoggerFactory.getLogger(FileTxnSnapLogMetricsTest.class);
 
-    CountDownLatch allCreatedLatch;
-
-    private class MockWatcher implements Watcher {
-
-        @Override
-        public void process(WatchedEvent e) {
-            LOG.info("all nodes created");
-            allCreatedLatch.countDown();
-        }
-
-    }
-
     @Test
     public void testFileTxnSnapLogMetrics() throws Exception {
-        SyncRequestProcessor.setSnapCount(100);
+        // disable automatic snapshot taking to leave writes in txn log
+        SyncRequestProcessor.setSnapCount(Integer.MAX_VALUE);
 

Review Comment:
   We should reset this in a finally block.
   That was a pre existing problem 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] eolivelli commented on pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
eolivelli commented on PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#issuecomment-1160388021

   @maoling @symat @ztzg PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] symat commented on pull request #1852: ZOOKEEPER-4511: Fix flaky test FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

Posted by GitBox <gi...@apache.org>.
symat commented on PR #1852:
URL: https://github.com/apache/zookeeper/pull/1852#issuecomment-1165591879

   thanks @kezhuw for fixing this test!!
   I pushed it to master, branch-3.8 and branch-3.7. The cherry-pick on branch-3.6 wasn't clean, so we will need a separate PR there, if we want to push this to branch-3.6.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org