You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/11/03 02:37:00 UTC

[jira] [Commented] (NUTCH-3014) Standardize Job names

    [ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782382#comment-17782382 ] 

ASF GitHub Bot commented on NUTCH-3014:
---------------------------------------

lewismc commented on code in PR #789:
URL: https://github.com/apache/nutch/pull/789#discussion_r1381111646


##########
src/java/org/apache/nutch/crawl/CrawlDbReader.java:
##########
@@ -812,7 +811,7 @@ public CrawlDatum get(String crawlDb, String url, Configuration config)
 
   @Override
   protected int process(String line, StringBuilder output) throws Exception {
-    Job job = NutchJob.getInstance(getConf());
+    Job job = Job.getInstance(getConf(), "Nutch CrawlDbReader: process " + this.crawlDb);

Review Comment:
   Thanks @sebastian-nagel 





> Standardize Job names
> ---------------------
>
>                 Key: NUTCH-3014
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3014
>             Project: Nutch
>          Issue Type: Improvement
>          Components: configuration, runtime
>    Affects Versions: 1.19
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.20
>
>
> There is a large degree of variability when we set the job name{{{{}}{}}}
>  
> {{Job job = NutchJob.getInstance(getConf());}}
> {{job.setJobName("read " + segment);}}
>  
> Some examples mention the job name, others don't. Some use upper case, others don't, etc.
> I think we can standardize the NutchJob job names. This would help when filtering jobs in YARN ResourceManager UI as well.
> I propose we implement the following convention
>  * *Nutch* (mandatory) - static value which prepends the job name, assists with distinguishing the Job as a NutchJob and making it easily findable.
>  * *${ClassName}* (mandatory) - literally the name of the Class the job is encoded in
>  * *${additional info}* (optional) - value could further distinguish the type of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.)
> _{*}Nutch ${ClassName}{*}: *${additional info}*_
> _Examples:_
>  * _Nutch LinkRank: Inverter_
>  * _Nutch CrawlDb: + $crawldb_
>  * _Nutch LinkDbReader: + $linkdb_
> Thanks for any suggestions/comments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)