You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by David Bargeron <Da...@nervana.com> on 2006/10/06 22:39:22 UTC

dump page content to Windows file system?

Hi - I have Nutch running on a Windows box. How do I dump the contents
of pages it crawls out of its database into the Windows file system?

 

Thanks!

Dave


Re: dump page content to Windows file system?

Posted by Dennis Kubes <nu...@dragonflymc.com>.
If you mean from the DFS to local filesystem you can do a copyToLocal.  
If you mean from a binary to a readable format your would need to write 
a MapReduce job and specify a TextOutputFormat.  If you are trying to 
read the crawl database you can use the nutch readdb command.

Dennis

David Bargeron wrote:
> Hi - I have Nutch running on a Windows box. How do I dump the contents
> of pages it crawls out of its database into the Windows file system?
>
>  
>
> Thanks!
>
> Dave
>
>
>