You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Pratik Poddar <pr...@gmail.com> on 2014/01/06 19:17:01 UTC
Nutch2 Readdb
I am using Nutch 2.x
I did everything as instructed in
http://wiki.apache.org/nutch/Nutch2Tutorial#!
After I do readdb, there is no error, no output, nothing.
http://wiki.apache.org/nutch/CommandLineOptions shows that readdb is not a
command. What is my next step?
Thanks
--
Pratik Poddar
http://www.cseblog.com
http://www.tomonotomo.com
http://pratikpoddar.wordpress.com/
Re: Nutch2 Readdb
Posted by Nguyen Manh Tien <ti...@gmail.com>.
Hi Pratik
if you run bin/nutch readdb you should see usage text like bellow
bin/nutch readdb
Usage: WebTableReader (-stats | -url [url] | -dump <out_dir> [-regex
regex])
[-crawlId <id>] [-content] [-headers] [-links] [-text]
-crawlId <id> - the id to prefix the schemas to operate on,
(default: storage.crawl.id)
-stats [-sort] - print overall statistics to System.out
[-sort] - list status sorted by host
-url <url> - print information on <url> to System.out
-dump <out_dir> [-regex regex] - dump the webtable to a text file in
<out_dir>
-content - dump also raw content
-headers - dump protocol headers
-links - dump links
-text - dump extracted text
[-regex] - filter on the URL of the webtable entry
To see crawldb stats you can you command
bin/nutch readdb -stats
On Tue, Jan 7, 2014 at 1:17 AM, Pratik Poddar <pr...@gmail.com>wrote:
> I am using Nutch 2.x
>
> I did everything as instructed in
> http://wiki.apache.org/nutch/Nutch2Tutorial#!
>
> After I do readdb, there is no error, no output, nothing.
>
> http://wiki.apache.org/nutch/CommandLineOptions shows that readdb is not a
> command. What is my next step?
>
> Thanks
>
> --
> Pratik Poddar
> http://www.cseblog.com
> http://www.tomonotomo.com
> http://pratikpoddar.wordpress.com/
>