You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Pratik Poddar <pr...@gmail.com> on 2014/01/06 19:17:01 UTC

Nutch2 Readdb

I am using Nutch 2.x

I did everything as instructed in
http://wiki.apache.org/nutch/Nutch2Tutorial#!

After I do readdb, there is no error, no output, nothing.

http://wiki.apache.org/nutch/CommandLineOptions shows that readdb is not a
command. What is my next step?

Thanks

-- 
Pratik Poddar
http://www.cseblog.com
http://www.tomonotomo.com
http://pratikpoddar.wordpress.com/

Re: Nutch2 Readdb

Posted by Nguyen Manh Tien <ti...@gmail.com>.

Hi Pratik

if you run bin/nutch readdb you should see usage text like bellow

bin/nutch readdb
Usage: WebTableReader (-stats | -url [url] | -dump <out_dir> [-regex
regex])
        [-crawlId <id>] [-content] [-headers] [-links] [-text]
    -crawlId <id>  - the id to prefix the schemas to operate on,
       (default: storage.crawl.id)
    -stats [-sort] - print overall statistics to System.out
    [-sort]        - list status sorted by host
    -url <url>     - print information on <url> to System.out
    -dump <out_dir> [-regex regex] - dump the webtable to a text file in
       <out_dir>
    -content       - dump also raw content
    -headers       - dump protocol headers
    -links         - dump links
    -text          - dump extracted text
    [-regex]       - filter on the URL of the webtable entry

To see crawldb stats you can you command

bin/nutch readdb -stats



On Tue, Jan 7, 2014 at 1:17 AM, Pratik Poddar <pr...@gmail.com>wrote:

> I am using Nutch 2.x
>
> I did everything as instructed in
> http://wiki.apache.org/nutch/Nutch2Tutorial#!
>
> After I do readdb, there is no error, no output, nothing.
>
> http://wiki.apache.org/nutch/CommandLineOptions shows that readdb is not a
> command. What is my next step?
>
> Thanks
>
> --
> Pratik Poddar
> http://www.cseblog.com
> http://www.tomonotomo.com
> http://pratikpoddar.wordpress.com/
>