You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/05/22 11:24:41 UTC

[jira] [Created] (NUTCH-1370) Expose exact number of urls injected @runtime

Lewis John McGibbney created NUTCH-1370:
-------------------------------------------

             Summary: Expose exact number of urls injected @runtime 
                 Key: NUTCH-1370
                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
             Project: Nutch
          Issue Type: Improvement
          Components: injector
    Affects Versions: 1.4, nutchgora
            Reporter: Lewis John McGibbney
             Fix For: 1.5, 2.1


Example: When using trunk, currently we see 

{code}
2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
{code}

I would like to see

{code}
2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
{code}

This would make debugging easier and would help those who end up getting 

{code}
2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493885#comment-13493885 ] 

Ferdy Galema commented on NUTCH-1370:
-------------------------------------

Hi,

I checked the patch, it seems you are simply grabbing standard counters for the purpose. Did you know you can simply write out all counters with LOG.info(currentJob.getCounters())

The best way for the purpose of this issue is to simply make a custom counter. So for every url injected (every context.write() call) you call something like:
context.getCounter("injector", "urls_injected").increment(1);

Then outputting this count at the end of the job is trivial. You could output all counters (like stated above) or only the "injector" group, whatever pleases you.
                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-2.x.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1370:
----------------------------------------

    Patch Info: Patch Available
    
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney reassigned NUTCH-1370:
-------------------------------------------

    Assignee: Lewis John McGibbney
    
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Sebastian Nagel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487316#comment-13487316 ] 

Sebastian Nagel commented on NUTCH-1370:
----------------------------------------

+1
Would be nice to see also the number of injected URLs rejected by URL filters.
                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487322#comment-13487322 ] 

Lewis John McGibbney commented on NUTCH-1370:
---------------------------------------------

No hassle Seb, I will also write this into the patch. Typically Injector logging (in its current state e.g. INFO) is very brief therefore I'll set all logging to INFO level as well. Thanks  
                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503013#comment-13503013 ] 

Hudson commented on NUTCH-1370:
-------------------------------

Integrated in Nutch-trunk #2026 (See [https://builds.apache.org/job/Nutch-trunk/2026/])
    NUTCH-1370 Expose exact number of urls injected @runtime (Revision 1412573)

     Result = SUCCESS
lewismc : 
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java

                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch, NUTCH-1370-2.x-v3.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche updated NUTCH-1370:
---------------------------------

    Affects Version/s:     (was: 1.4)
                       1.5
        Fix Version/s:     (was: 1.5)
                       1.6
    
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.1
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1370:
----------------------------------------

    Attachment: NUTCH-1370-2.x-v2.patch

2nd WIP for 2.x I'm having difficulty correctly implementing JobClient#runJob as the currentJob param is not correct... 
{code}
RunningJob mapJob = JobClient.runJob(currentJob);
{code}

@Seb,
Regarding your patch, this looks great, is much cleaner than my proposal, I've tested and I'm +1 for committing.
                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1370:
----------------------------------------

    Attachment: NUTCH-1370-2.x.patch

WIP patch for 2.x. I am convinced that I'm not using the Counters, Counter, or Job API correctly here. I've spent a bit of time attempting to work my way around the various classes and methods but I am not getting accurate values for the map input and output counters. If someone could take a look and correct  me here it would make my day. I will cook up the 1.x patch once I learn the right way.  
                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-2.x.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Sebastian Nagel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel updated NUTCH-1370:
-----------------------------------

    Attachment: NUTCH-1370-2.x-v3.patch

Hi Lewis, yes, the 1.x patch is not easily transferred for 2.x because of different (old vs. new) map reduce APIs. Here is a trial...
One question: the logged line "number of urls attempting to inject" suggests that there is a third count "urls successfully injected" or similar. What's the intention with "attempting"?

                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch, NUTCH-1370-2.x-v3.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502799#comment-13502799 ] 

Lewis John McGibbney commented on NUTCH-1370:
---------------------------------------------

Tested against medium sized seed lists and works a charm. I like the counters Seb thanks for this contrib. 
Committed @revision 1412566 in 2.x
This also covers NUTCH-1471
I also added the correct mapping for the host table in gora-cassandra-mapping.xml
                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch, NUTCH-1370-2.x-v3.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1370:
----------------------------------------

    Fix Version/s:     (was: 2.1)
                   2.2
    
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche updated NUTCH-1370:
---------------------------------

    Priority: Minor  (was: Major)

Running in pseudo-distributed mode gives you more information if you look at the Hadoop web interface. You get the number of items passed to the mappers and reducers etc... You can of course add a message like this in the logs, won't do any harm :-)
                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: 1.4, nutchgora
>            Reporter: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.5, 2.1
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502813#comment-13502813 ] 

Hudson commented on NUTCH-1370:
-------------------------------

Integrated in nutch-trunk-maven #503 (See [https://builds.apache.org/job/nutch-trunk-maven/503/])
    NUTCH-1370 Expose exact number of urls injected @runtime (Revision 1412573)

     Result = SUCCESS
lewismc : 
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java

                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch, NUTCH-1370-2.x-v3.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney resolved NUTCH-1370.
-----------------------------------------

    Resolution: Fixed

Committed @revision 1412573 in trunk
Thank you everyone for the input here.
                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch, NUTCH-1370-2.x-v3.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Sebastian Nagel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel updated NUTCH-1370:
-----------------------------------

    Attachment: NUTCH-1370-1.x.patch

Ferdy is right: custom counters are more transparent.
Patch for 1.x

                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503012#comment-13503012 ] 

Hudson commented on NUTCH-1370:
-------------------------------

Integrated in Nutch-nutchgora #412 (See [https://builds.apache.org/job/Nutch-nutchgora/412/])
    NUTCH-1370 Expose exact number of urls injected @runtime (Revision 1412570)
NUTCH-1370 Expose exact number of urls injected @runtime (Revision 1412566)

     Result = SUCCESS
lewismc : 
Files : 
* /nutch/branches/2.x/CHANGES.txt

lewismc : 
Files : 
* /nutch/branches/2.x/conf/gora-cassandra-mapping.xml
* /nutch/branches/2.x/src/java/org/apache/nutch/crawl/InjectorJob.java
* /nutch/branches/2.x/src/java/org/apache/nutch/storage/StorageUtils.java

                
> Expose exact number of urls injected @runtime 
> ----------------------------------------------
>
>                 Key: NUTCH-1370
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1370
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.6, 2.2
>
>         Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch, NUTCH-1370-2.x-v3.patch
>
>
> Example: When using trunk, currently we see 
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: starting at 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
> 2012-05-22 09:04:00,239 INFO  crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Injected N urls to crawl/crawldb
> 2012-05-22 09:04:00,253 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO  plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting 
> {code}
> 2012-05-22 09:04:04,850 WARN  crawl.Generator - Generator: 0 records selected for fetching, exiting ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira