You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2016/05/03 22:42:42 UTC

Re: user Digest 3 May 2016 14:53:20 -0000 Issue 2582

Hi Bin,
Hope you are doing well!
Please see response below

On Tue, May 3, 2016 at 7:53 AM, <us...@nutch.apache.org> wrote:

>
> From: Bin Wang <bi...@gmail.com>
> To: "Apache.Nutch.User" <us...@nutch.apache.org>
> Cc:
> Date: Mon, 2 May 2016 13:26:27 -0600
> Subject: Visualization Tool for Nutch
> Hi there,
>
> Is there a state of the art visualization tool that is Nutch friendly?
>
> I am planning to get the crawldb information into a better format that can
> be digested by Neo4j or Gephi for analysis. However, I have read here
> <
> http://grokbase.com/t/nutch/user/124fbmankh/how-to-do-detailed-postmortem-analysis-and-visualization-of-nutch-crawl-data
> >
> and there <http://wiki.apache.org/nutch/bin/nutch%20webgraph> about the
> demand but I don't see any solid tutorial or documentation regarding the
> visualization.
>
> I don't think visualization is a necessity for Nutch but something out of
> the box will be interesting to have.  (people love graphs)
>
>
Mike Joyce and I were previously working on the following (currently
stalled)

   1. Upgrade enture MR API to 'New' MR API within master branch.
   2. Use TinkerPop's ScriptInputFormat [0] for writing an extension of the
   WebgraphDB out to the Input for gremlin [1]. Once Nutch data is in such a
   format then we open up another world for graph analysis of Nutch data.

I'm going to restart working on 1 above... might even get it finished
during ApacheCon next week. We will see.

Lewis
 [0]
http://tinkerpop.apache.org/javadocs/3.2.0-incubating/full/index.html?org/apache/tinkerpop/gremlin/hadoop/structure/io/script/ScriptInputFormat.html
[1] https://github.com/tinkerpop/gremlin/wiki