You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rafael Pappert <rp...@fwpsystems.com> on 2012/03/02 12:31:16 UTC
Webgraph / getmerge
Hello List,
how do I get the inlinks/outlinks/nodes from hdfs into a plain textfile?
I created the webgraph with this command:
nutch webgraph -segmentDir crawl/segments -webgraphdb webgraph
Now I try to get the data with getmerge like this:
hadoop fs -getmerge webgraph/inlinks /tmp/webgraph/inlinks
After less than 1 second /tmp/webgraph/inlinks is created with a
size of 0.
hdfs:/user/xyz/webgraph/inlinks contains lots of part-xxxx directories
and has a size of several GB.
Thanks in advance,
Rafael.