You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by David Bargeron <Da...@nervana.com> on 2006/10/06 22:39:22 UTC
dump page content to Windows file system?
Hi - I have Nutch running on a Windows box. How do I dump the contents
of pages it crawls out of its database into the Windows file system?
Thanks!
Dave
Re: dump page content to Windows file system?
Posted by Dennis Kubes <nu...@dragonflymc.com>.
If you mean from the DFS to local filesystem you can do a copyToLocal.
If you mean from a binary to a readable format your would need to write
a MapReduce job and specify a TextOutputFormat. If you are trying to
read the crawl database you can use the nutch readdb command.
Dennis
David Bargeron wrote:
> Hi - I have Nutch running on a Windows box. How do I dump the contents
> of pages it crawls out of its database into the Windows file system?
>
>
>
> Thanks!
>
> Dave
>
>
>