You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rafael Pappert <rp...@fwpsystems.com> on 2012/03/02 12:31:16 UTC

Webgraph / getmerge

Hello List,

how do I get the inlinks/outlinks/nodes from hdfs into a plain textfile?
I created the webgraph with this command:

nutch webgraph -segmentDir crawl/segments -webgraphdb webgraph

Now I try to get the data with getmerge like this:

hadoop fs -getmerge webgraph/inlinks /tmp/webgraph/inlinks

After less than 1 second /tmp/webgraph/inlinks is created with a
size of 0.

hdfs:/user/xyz/webgraph/inlinks contains lots of part-xxxx directories
and has a size of several GB.

Thanks in advance,
Rafael.