You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Joshua J Pavel <jp...@us.ibm.com> on 2011/09/08 19:51:21 UTC

-stats accessible through .jsp


I ask this time to time, but I was wondering if anybody would have any
insight on how I might be able to get the -stats information
(nutch-1.2/bin/nutch crawl/crawldb -stats) information accessible through
the front-end search.jsp?

I'm assuming I'll have to write a plugin to either populate some generic
fields on each run or put the code to generate the data in NutchBean,
either of which is OK with me.  I'm just not sure how to go about it.,
since I don't think there is a generic meta data field, only Hits (but I
don't know too much about the structure of the DB), and I'm not sure where
the code is to generate the -stats information.

Thanks!

Re: -stats accessible through .jsp

Posted by Joshua J Pavel <jp...@us.ibm.com>.
Thanks for the response.

I was trying this in my search.jsp:

import="org.apache.nutch.crawl.*"
...
crawldbreader = new CrawlDbReader(nutchConf.get("searcher.dir") + "
-stats");

I believe that dumps to stdout, but I'd worry about the output format after
I got through the error: /_search.java : 383 : The constructor
CrawlDbReader(String) is undefined



|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |lewis john mcgibbney <le...@gmail.com>                                                                                                  |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |user@nutch.apache.org                                                                                                                             |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |nutch-user@lucene.apache.org                                                                                                                      |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |09/08/2011 03:30 PM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: -stats accessible through .jsp                                                                                                                |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Hi Joshua,

please see below

On Thu, Sep 8, 2011 at 6:51 PM, Joshua J Pavel <jp...@us.ibm.com> wrote:

>
>
> I ask this time to time, but I was wondering if anybody would have any
> insight on how I might be able to get the -stats information
> (nutch-1.2/bin/nutch crawl/crawldb -stats) information accessible through
> the front-end search.jsp?
>

In short, no I don't know. What I do know is that passing
'nutch-1.2/bin/nutch crawl/crawldb -stats' would not give you any results,
I
think you would get a output of all possible commands and how to call them.
The command you are referring to is readdb which is an alias for
o.a.n.crawl.CrawlDbReader, take a look at this class and try to figure out
how you could access the stats data.

>
> I'm assuming I'll have to write a plugin to either populate some generic
> fields on each run or put the code to generate the data in NutchBean,
> either of which is OK with me.


If you look (for example) at the ontology plugin which shipped with Nutch
1.2, this plugin was used only within the webapp, however various
configuration options also had to be configured within the WAR file used
within the servlet container as was the same with the index dir of your
Lucene index. This is the type of functionality you would require to get
crawldb stats output however I'm afraid I am no more information sorry.

HTH


--
*Lewis*


Re: -stats accessible through .jsp

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi Joshua,

please see below

On Thu, Sep 8, 2011 at 6:51 PM, Joshua J Pavel <jp...@us.ibm.com> wrote:

>
>
> I ask this time to time, but I was wondering if anybody would have any
> insight on how I might be able to get the -stats information
> (nutch-1.2/bin/nutch crawl/crawldb -stats) information accessible through
> the front-end search.jsp?
>

In short, no I don't know. What I do know is that passing
'nutch-1.2/bin/nutch crawl/crawldb -stats' would not give you any results, I
think you would get a output of all possible commands and how to call them.
The command you are referring to is readdb which is an alias for
o.a.n.crawl.CrawlDbReader, take a look at this class and try to figure out
how you could access the stats data.

>
> I'm assuming I'll have to write a plugin to either populate some generic
> fields on each run or put the code to generate the data in NutchBean,
> either of which is OK with me.


If you look (for example) at the ontology plugin which shipped with Nutch
1.2, this plugin was used only within the webapp, however various
configuration options also had to be configured within the WAR file used
within the servlet container as was the same with the index dir of your
Lucene index. This is the type of functionality you would require to get
crawldb stats output however I'm afraid I am no more information sorry.

HTH


-- 
*Lewis*