You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by GitBox <gi...@apache.org> on 2022/06/21 15:34:03 UTC

[GitHub] [zeppelin] Reamer opened a new pull request, #4382: [ZEPPELIN-5754] Init Notes in background

Reamer opened a new pull request, #4382:
URL: https://github.com/apache/zeppelin/pull/4382

   ### What is this PR for?
   This pull request introduces a mechanism to initialize Notes in the background.
   A consumer is passed, which is then called with the NoteId.
   The consumer itself is responsible for checking, for example, whether the note still exists.
   The initialization runs in several threads. The consumer must therefore be thread-safe.
   This approach scales well, in my opinion, because it makes effective use of NoteCache. Another approach, e.g. to initialize the Notes in the sub-module, would not utilize the NoteCache as effectively. This is true if one sub-module is faster than the other.
   
   ### What type of PR is it?
    - Improvement
   
   ### What is the Jira issue?
   * https://issues.apache.org/jira/browse/ZEPPELIN-5754
   
   ### How should this be tested?
   * CI
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * Does the licenses files need to update? No
   * Is there breaking changes for older versions? No
   * Does this needs documentation? No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] jongyoul commented on a diff in pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
jongyoul commented on code in PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#discussion_r912501438


##########
zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java:
##########
@@ -109,6 +119,60 @@ public void recoveryIfNecessary() {
     }
   }
 
+  /**
+   * Subsystems can add a consumer, which is executed during initialization.
+   * Initialization is performed in parallel and with multiple threads.
+   *
+   * The consumer must be thread safe.
+   *
+   * @param initConsumer Consumer, which is passed the NoteId.
+   */
+  public void addInitConsumer(Consumer<String> initConsumer) {
+    this.initConsumers.add(initConsumer);
+  }
+
+  /**
+   * Asynchronous and parallel initialization of notes (during startup)
+   */
+  public void initNotebook() {
+    initExecutor = new ThreadPoolExecutor(0, Runtime.getRuntime().availableProcessors(), 1, TimeUnit.MINUTES,

Review Comment:
   nit. Why don't you initialize it on the constructor or a static variable? If we call `initNotebook` multiple times by mistake, it can leak memory because `ThreaPoolExecutor` won't be shut down clearly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] Reamer commented on pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
Reamer commented on PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#issuecomment-1174794215

   I will include the changes in the master on Thursday unless further comments are received.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] Reamer commented on a diff in pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
Reamer commented on code in PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#discussion_r912783098


##########
zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java:
##########
@@ -109,6 +119,60 @@ public void recoveryIfNecessary() {
     }
   }
 
+  /**
+   * Subsystems can add a consumer, which is executed during initialization.
+   * Initialization is performed in parallel and with multiple threads.
+   *
+   * The consumer must be thread safe.
+   *
+   * @param initConsumer Consumer, which is passed the NoteId.
+   */
+  public void addInitConsumer(Consumer<String> initConsumer) {
+    this.initConsumers.add(initConsumer);
+  }
+
+  /**
+   * Asynchronous and parallel initialization of notes (during startup)
+   */
+  public void initNotebook() {
+    initExecutor = new ThreadPoolExecutor(0, Runtime.getRuntime().availableProcessors(), 1, TimeUnit.MINUTES,

Review Comment:
   The main reason is that in the method `waitForFinishInit` the ExecutorService is shut down to wait for the termination afterwards. When calling `initNotebook` again, a new ExecutorService must be created, because the old one might be closed.
   I will add a termination check in `initNotebook` to solve this problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] jongyoul commented on a diff in pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
jongyoul commented on code in PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#discussion_r903229036


##########
zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/scheduler/QuartzSchedulerService.java:
##########
@@ -51,123 +51,110 @@ public class QuartzSchedulerService implements SchedulerService {
   private final ZeppelinConfiguration zeppelinConfiguration;
   private final Notebook notebook;
   private final Scheduler scheduler;
-  private final Thread loadingNotesThread;
 
   @Inject
   public QuartzSchedulerService(ZeppelinConfiguration zeppelinConfiguration, Notebook notebook)
       throws SchedulerException {
     this.zeppelinConfiguration = zeppelinConfiguration;
     this.notebook = notebook;
     this.scheduler = getScheduler();
-    this.scheduler.getListenerManager().addJobListener(new CronJobListener());
+    this.scheduler.getListenerManager().addJobListener(new MetricCronJobListener());
+    this.scheduler.getListenerManager().addTriggerListener(new ZeppelinCronJobTriggerListerner());
     this.scheduler.start();
-
-    // Do in a separated thread because there may be many notes,
-    // loop all notes in the main thread may block the restarting of Zeppelin server
-    // TODO(zjffdu) It may cause issue when user delete note before this thread is finished
-    this.loadingNotesThread = new Thread(() -> {
-        LOGGER.info("Starting init cronjobs");
-        notebook.getNotesInfo().stream()
-                .forEach(entry -> {
-                  try {
-                    refreshCron(entry.getId());
-                  } catch (Exception e) {
-                    LOGGER.warn("Fail to refresh cron for note: {}", entry.getId());
-                  }
-                });
-        LOGGER.info("Complete init cronjobs");
-    });
-    loadingNotesThread.setName("Init CronJob Thread");
-    loadingNotesThread.setDaemon(true);
-    loadingNotesThread.start();
+    // Start Init
+    notebook.addInitConsumer(this::refreshCron);
   }
 
+
   private Scheduler getScheduler() throws SchedulerException {
     return new StdSchedulerFactory().getScheduler();
   }
 
-  /**
-   * This is only for testing, unit test should always call this method in setup() before testing.
-   */
-  @VisibleForTesting
-  public void waitForFinishInit() {
+  @PreDestroy

Review Comment:
   I'm just curious about it but does `@PreDestroy` work properly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] jongyoul commented on a diff in pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
jongyoul commented on code in PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#discussion_r913555148


##########
zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java:
##########
@@ -109,6 +119,60 @@ public void recoveryIfNecessary() {
     }
   }
 
+  /**
+   * Subsystems can add a consumer, which is executed during initialization.
+   * Initialization is performed in parallel and with multiple threads.
+   *
+   * The consumer must be thread safe.
+   *
+   * @param initConsumer Consumer, which is passed the NoteId.
+   */
+  public void addInitConsumer(Consumer<String> initConsumer) {
+    this.initConsumers.add(initConsumer);
+  }
+
+  /**
+   * Asynchronous and parallel initialization of notes (during startup)
+   */
+  public void initNotebook() {
+    initExecutor = new ThreadPoolExecutor(0, Runtime.getRuntime().availableProcessors(), 1, TimeUnit.MINUTES,

Review Comment:
   Understood the intention. Thank you for the explanation. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] Reamer commented on a diff in pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
Reamer commented on code in PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#discussion_r912790957


##########
zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/Notebook.java:
##########
@@ -109,6 +119,60 @@ public void recoveryIfNecessary() {
     }
   }
 
+  /**
+   * Subsystems can add a consumer, which is executed during initialization.
+   * Initialization is performed in parallel and with multiple threads.
+   *
+   * The consumer must be thread safe.
+   *
+   * @param initConsumer Consumer, which is passed the NoteId.
+   */
+  public void addInitConsumer(Consumer<String> initConsumer) {
+    this.initConsumers.add(initConsumer);
+  }
+
+  /**
+   * Asynchronous and parallel initialization of notes (during startup)
+   */
+  public void initNotebook() {
+    initExecutor = new ThreadPoolExecutor(0, Runtime.getRuntime().availableProcessors(), 1, TimeUnit.MINUTES,

Review Comment:
   @jongyoul Please check again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] Reamer commented on pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
Reamer commented on PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#issuecomment-1162907579

   > Could you please verify it as well?
   
   I have verified the changes. These should still be covered by the existing tests.
   
   > If so, could you please summarize the changes?
   
   I have summarized the changes in the PR description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] jongyoul commented on pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
jongyoul commented on PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#issuecomment-1173110653

   Overall, LGTM except for the minor comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] Reamer merged pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
Reamer merged PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] Reamer commented on a diff in pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
Reamer commented on code in PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#discussion_r903486737


##########
zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/scheduler/QuartzSchedulerService.java:
##########
@@ -51,123 +51,110 @@ public class QuartzSchedulerService implements SchedulerService {
   private final ZeppelinConfiguration zeppelinConfiguration;
   private final Notebook notebook;
   private final Scheduler scheduler;
-  private final Thread loadingNotesThread;
 
   @Inject
   public QuartzSchedulerService(ZeppelinConfiguration zeppelinConfiguration, Notebook notebook)
       throws SchedulerException {
     this.zeppelinConfiguration = zeppelinConfiguration;
     this.notebook = notebook;
     this.scheduler = getScheduler();
-    this.scheduler.getListenerManager().addJobListener(new CronJobListener());
+    this.scheduler.getListenerManager().addJobListener(new MetricCronJobListener());
+    this.scheduler.getListenerManager().addTriggerListener(new ZeppelinCronJobTriggerListerner());
     this.scheduler.start();
-
-    // Do in a separated thread because there may be many notes,
-    // loop all notes in the main thread may block the restarting of Zeppelin server
-    // TODO(zjffdu) It may cause issue when user delete note before this thread is finished
-    this.loadingNotesThread = new Thread(() -> {
-        LOGGER.info("Starting init cronjobs");
-        notebook.getNotesInfo().stream()
-                .forEach(entry -> {
-                  try {
-                    refreshCron(entry.getId());
-                  } catch (Exception e) {
-                    LOGGER.warn("Fail to refresh cron for note: {}", entry.getId());
-                  }
-                });
-        LOGGER.info("Complete init cronjobs");
-    });
-    loadingNotesThread.setName("Init CronJob Thread");
-    loadingNotesThread.setDaemon(true);
-    loadingNotesThread.start();
+    // Start Init
+    notebook.addInitConsumer(this::refreshCron);
   }
 
+
   private Scheduler getScheduler() throws SchedulerException {
     return new StdSchedulerFactory().getScheduler();
   }
 
-  /**
-   * This is only for testing, unit test should always call this method in setup() before testing.
-   */
-  @VisibleForTesting
-  public void waitForFinishInit() {
+  @PreDestroy

Review Comment:
   As far as I can see, it does. We also have @PreDestroy in the Lucene sub-system to close the index cleanly and shut down the SearchService cleanly with it.
   https://github.com/apache/zeppelin/blob/5ebd0fe6231ec0b6d3576ee4ec3be14449fe5c2f/zeppelin-zengine/src/main/java/org/apache/zeppelin/search/LuceneSearch.java#L416-L430



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] Reamer commented on a diff in pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
Reamer commented on code in PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#discussion_r903619482


##########
zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/scheduler/QuartzSchedulerService.java:
##########
@@ -51,123 +51,110 @@ public class QuartzSchedulerService implements SchedulerService {
   private final ZeppelinConfiguration zeppelinConfiguration;
   private final Notebook notebook;
   private final Scheduler scheduler;
-  private final Thread loadingNotesThread;
 
   @Inject
   public QuartzSchedulerService(ZeppelinConfiguration zeppelinConfiguration, Notebook notebook)
       throws SchedulerException {
     this.zeppelinConfiguration = zeppelinConfiguration;
     this.notebook = notebook;
     this.scheduler = getScheduler();
-    this.scheduler.getListenerManager().addJobListener(new CronJobListener());
+    this.scheduler.getListenerManager().addJobListener(new MetricCronJobListener());
+    this.scheduler.getListenerManager().addTriggerListener(new ZeppelinCronJobTriggerListerner());
     this.scheduler.start();
-
-    // Do in a separated thread because there may be many notes,
-    // loop all notes in the main thread may block the restarting of Zeppelin server
-    // TODO(zjffdu) It may cause issue when user delete note before this thread is finished
-    this.loadingNotesThread = new Thread(() -> {
-        LOGGER.info("Starting init cronjobs");
-        notebook.getNotesInfo().stream()
-                .forEach(entry -> {
-                  try {
-                    refreshCron(entry.getId());
-                  } catch (Exception e) {
-                    LOGGER.warn("Fail to refresh cron for note: {}", entry.getId());
-                  }
-                });
-        LOGGER.info("Complete init cronjobs");
-    });
-    loadingNotesThread.setName("Init CronJob Thread");
-    loadingNotesThread.setDaemon(true);
-    loadingNotesThread.start();
+    // Start Init
+    notebook.addInitConsumer(this::refreshCron);
   }
 
+
   private Scheduler getScheduler() throws SchedulerException {
     return new StdSchedulerFactory().getScheduler();
   }
 
-  /**
-   * This is only for testing, unit test should always call this method in setup() before testing.
-   */
-  @VisibleForTesting
-  public void waitForFinishInit() {
+  @PreDestroy

Review Comment:
   You made me curious with your statement and it seems @PreDestroy is not working properly.
   I will investigate this further and remove the change from this PullRequest for now. The close method is only cosmetic and was needed when I took a different approach.
   Initial investigation has shown that `sharedServiceLocator.shutdown();` is needed to trigger the code with the PreDestroy annotation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zeppelin] jongyoul commented on pull request #4382: [ZEPPELIN-5754] Init Notes in background

Posted by GitBox <gi...@apache.org>.
jongyoul commented on PR #4382:
URL: https://github.com/apache/zeppelin/pull/4382#issuecomment-1162568627

   This PR looks like including CronJob changes. Could you please verify it as well? If so, could you please summarize the changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org