You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by GitBox <gi...@apache.org> on 2022/01/05 17:54:06 UTC

[GitHub] [nutch] prakharchaube opened a new pull request #721: NUTCH-2923: Added JobId in Job Failure logs

prakharchaube opened a new pull request #721:
URL: https://github.com/apache/nutch/pull/721


   Fix for NUTCH-2923: Added JobId in Job Failure logs.
   @lewismc fyi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] sebastian-nagel merged pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
sebastian-nagel merged pull request #721:
URL: https://github.com/apache/nutch/pull/721


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] prakharchaube commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
prakharchaube commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1015495906


   @lewismc @sebastian-nagel done the changes please review, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] lewismc commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1006021514


   Hi @prakharchaube sterling job!
   Please consider the following
   ```
   //parameterized logging... see https://www.slf4j.org/faq.html#logging_performance
   LOG.error("CrawlDb update job did not succeed, job id: {}, job status: {}, reason: {}", job.getJobID(), job.getStatus().getState(), job.getStatus().getFailureInfo());
   ```
   Vs what we currently have
   ```
           String message = "CrawlDb update job did not succeed, job id: "
               + job.getJobID() + ", job status:" + job.getStatus().getState()
               + ", reason: " + job.getStatus().getFailureInfo();
           LOG.error(message);
   ```
   It's up to you if you want to implement this as a separate patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] sebastian-nagel commented on a change in pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
sebastian-nagel commented on a change in pull request #721:
URL: https://github.com/apache/nutch/pull/721#discussion_r786958191



##########
File path: src/java/org/apache/nutch/util/SitemapProcessor.java
##########
@@ -402,9 +402,8 @@ public void sitemap(Path crawldb, Path hostdb, Path sitemapUrlDir, boolean stric
     try {
       boolean success = job.waitForCompletion(true);
       if (!success) {
-        String message = "SitemapProcessor_" + crawldb.toString()
-            + " job did not succeed, job status: " + job.getStatus().getState()
-            + ", reason: " + job.getStatus().getFailureInfo();
+        String message = NutchJob.getJobFailureLogMessage(
+            "SitemapProcessor_" + crawldb.toString(), job);

Review comment:
       I know it was already there: maybe simplify the job name in the message to simply "SitemapProcessor" and leave the path to the CrawlDb away?

##########
File path: src/java/org/apache/nutch/util/NutchJob.java
##########
@@ -81,4 +83,26 @@ public static void cleanupAfterFailure(Path tempDir, Path lock, FileSystem fs)
     }
   }
 
+  /**
+   * Method to return job failure log message. To be used across all Jobs
+   * 
+   * @param name
+   *          Name/Type of the job
+   * @param job
+   *          Job Object for Job details
+   * @return job failure log message
+   * @throws IOException
+   *           Can occur during fetching job status
+   * @throws InterruptedException
+   *           Can occur during fetching job status
+   */
+  public static String getJobFailureLogMessage(String name, Job job)
+      throws IOException, InterruptedException {
+    if (job != null) {
+      return String.format(JOB_FAILURE_LOG_FORMAT, name, job.getJobID(),
+          job.getStatus(), job.getStatus().getFailureInfo());

Review comment:
       Really `job.getStatus()` instead of `job.getStatus.getState()` ?
   
   The log output doesn't look readable:
   ```
   2022-01-18 17:37:04,892 ERROR o.a.n.c.Injector [main] Injector job did not succeed, job id: job_local138302276_0001, job status: job-id : job_local138302276_0001uber-mode : falsemap-progress : 1.0reduce-progress : 0.0cleanup-progress : 1.0setup-progress : 1.0runstate : FAILEDstart-time : 0user-name : sebpriority : DEFAULTscheduling-info : NAnum-used-slots0num-reserved-slots0used-mem0reserved-mem0needed-mem0, reason: NA
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] prakharchaube commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
prakharchaube commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1015009448


   @lewismc apologies for the delay I've fallen ill which is why I'm unable to work on it but I'll will finish it as soon as I'm recovered. Tentatively by this Sunday.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] lewismc commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1006873161


   What about [StringUtil.java](https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/StringUtil.java)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] prakharchaube commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
prakharchaube commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1006884336


   hmm, can add it in that class as well! Thanks, @lewismc, will make the amends and commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] lewismc commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1014841784


   @prakharchaube would be great to get this into master branch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] sebastian-nagel commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
sebastian-nagel commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1010301311


   NutchTool is used to run Nutch jobs via Nutch server and the REST API. By now, only the basic/common Nutch jobs (but not all) extend NutchTool.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] sebastian-nagel commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
sebastian-nagel commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1008803382


   Thanks, @prakharchaube - good idea! The code to handle job failures is kind of boilerplate for all Nutch tools. But there's some variation regarding the cleanup. In addition, log messages and stack traces should show the originating class (Injector, etc.) and not that of a utility class.
   
   > What about [StringUtil.java](https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/StringUtil.java)
   
   Maybe [NutchJob](https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/NutchJob.java) is also a good location for code for shared by Nutch tools/jobs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] lewismc commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1006175869


   @prakharchaube 
   
   > I see the same String message is further used as...
   
   That was bad of me. i should have looked another few lines down!
   
   > What I can do is use String.format(); to initialize the value of the message that would be neater.
   
   I agree that it would make for cleaner code.
   
   > is it the ideal exception choice here?
   
   Let's leave this for another ticket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] prakharchaube commented on a change in pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
prakharchaube commented on a change in pull request #721:
URL: https://github.com/apache/nutch/pull/721#discussion_r786980017



##########
File path: src/java/org/apache/nutch/util/NutchJob.java
##########
@@ -81,4 +83,26 @@ public static void cleanupAfterFailure(Path tempDir, Path lock, FileSystem fs)
     }
   }
 
+  /**
+   * Method to return job failure log message. To be used across all Jobs
+   * 
+   * @param name
+   *          Name/Type of the job
+   * @param job
+   *          Job Object for Job details
+   * @return job failure log message
+   * @throws IOException
+   *           Can occur during fetching job status
+   * @throws InterruptedException
+   *           Can occur during fetching job status
+   */
+  public static String getJobFailureLogMessage(String name, Job job)
+      throws IOException, InterruptedException {
+    if (job != null) {
+      return String.format(JOB_FAILURE_LOG_FORMAT, name, job.getJobID(),
+          job.getStatus(), job.getStatus().getFailureInfo());

Review comment:
       Miss from my side! Done, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] prakharchaube commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
prakharchaube commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1006040395


   @lewismc I agree we should use parameterized logging however, I see the same String _message_ is further used as
   throw new RuntimeException(message);
   
   What I can do is use String.format(); to initialize the value of the message that would be neater.
   
   Another doubt in my mind was pertaining to throwing RuntimeException, is it the ideal exception choice here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] prakharchaube commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
prakharchaube commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1006847258


   @lewismc Surely.
   Something I noticed is that all error messages have a common format which is
   "%s job did not succeed, job id: %s job status: %s ,reason: %s"
   so I was thinking of making a file for Constants (eg. CommonConstants.java) and adding this format there and reusing it everywhere instead of initializing the same format everywhere, please let me know if that is too much of a change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] prakharchaube commented on pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
prakharchaube commented on pull request #721:
URL: https://github.com/apache/nutch/pull/721#issuecomment-1009130088


   @sebastian-nagel thanks for the suggestion, not to worry the stack trace would still show the logs origin as the Job Classes.
   
   > Maybe [NutchJob](https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/NutchJob.java) is also a good location for code for shared by Nutch tools/jobs.
   
   Instead of NutchJob I was thinking [NutchTool ](https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/NutchTool.java) could be the class for incorporating the change, let me know your thoughts!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] prakharchaube commented on a change in pull request #721: NUTCH-2923: Added JobId in Job Failure logs

Posted by GitBox <gi...@apache.org>.
prakharchaube commented on a change in pull request #721:
URL: https://github.com/apache/nutch/pull/721#discussion_r786979424



##########
File path: src/java/org/apache/nutch/util/SitemapProcessor.java
##########
@@ -402,9 +402,8 @@ public void sitemap(Path crawldb, Path hostdb, Path sitemapUrlDir, boolean stric
     try {
       boolean success = job.waitForCompletion(true);
       if (!success) {
-        String message = "SitemapProcessor_" + crawldb.toString()
-            + " job did not succeed, job status: " + job.getStatus().getState()
-            + ", reason: " + job.getStatus().getFailureInfo();
+        String message = NutchJob.getJobFailureLogMessage(
+            "SitemapProcessor_" + crawldb.toString(), job);

Review comment:
       @sebastian-nagel Done. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org