You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bin Wang <bi...@gmail.com> on 2016/05/02 21:26:27 UTC

Visualization Tool for Nutch

Hi there,

Is there a state of the art visualization tool that is Nutch friendly?

I am planning to get the crawldb information into a better format that can
be digested by Neo4j or Gephi for analysis. However, I have read here
<http://grokbase.com/t/nutch/user/124fbmankh/how-to-do-detailed-postmortem-analysis-and-visualization-of-nutch-crawl-data>
and there <http://wiki.apache.org/nutch/bin/nutch%20webgraph> about the
demand but I don't see any solid tutorial or documentation regarding the
visualization.

I don't think visualization is a necessity for Nutch but something out of
the box will be interesting to have.  (people love graphs)

Bin

Re: Visualization Tool for Nutch

Posted by Bin Wang <bi...@gmail.com>.
Hi Chris,

Thanks for sharing memex project, I will definitely take a look first.

Meanwhile, I am planning to start with Neo4j and Gephy. Will definitely
keep you posted.
Visualization of graph, especially big graphs (like most crawls) is
challenging.

Bin

On Mon, May 2, 2016 at 1:27 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Bin I completely agree.
>
> My team built the following:
>
> 1. Memex Explorer (http://github.com/memex-explorer/memex-explorer)
> but not actively developed anymore that used Bokeh.js and streaming
> publishing from Nutch under development to publish events and visualize
> crawls
>
> 2. We are using D3.js in my team to visualize the Nutch crawl graph.
> Lots under development.
>
> Are you interested in collaborating?
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>
>
>
> On 5/2/16, 9:26 AM, "Bin Wang" <bi...@gmail.com> wrote:
>
> >Hi there,
> >
> >Is there a state of the art visualization tool that is Nutch friendly?
> >
> >I am planning to get the crawldb information into a better format that can
> >be digested by Neo4j or Gephi for analysis. However, I have read here
> ><
> http://grokbase.com/t/nutch/user/124fbmankh/how-to-do-detailed-postmortem-analysis-and-visualization-of-nutch-crawl-data
> >
> >and there <http://wiki.apache.org/nutch/bin/nutch%20webgraph> about the
> >demand but I don't see any solid tutorial or documentation regarding the
> >visualization.
> >
> >I don't think visualization is a necessity for Nutch but something out of
> >the box will be interesting to have.  (people love graphs)
> >
> >Bin
>

Re: Visualization Tool for Nutch

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Bin I completely agree.

My team built the following:

1. Memex Explorer (http://github.com/memex-explorer/memex-explorer)
but not actively developed anymore that used Bokeh.js and streaming
publishing from Nutch under development to publish events and visualize
crawls

2. We are using D3.js in my team to visualize the Nutch crawl graph.
Lots under development.

Are you interested in collaborating?

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++









On 5/2/16, 9:26 AM, "Bin Wang" <bi...@gmail.com> wrote:

>Hi there,
>
>Is there a state of the art visualization tool that is Nutch friendly?
>
>I am planning to get the crawldb information into a better format that can
>be digested by Neo4j or Gephi for analysis. However, I have read here
><http://grokbase.com/t/nutch/user/124fbmankh/how-to-do-detailed-postmortem-analysis-and-visualization-of-nutch-crawl-data>
>and there <http://wiki.apache.org/nutch/bin/nutch%20webgraph> about the
>demand but I don't see any solid tutorial or documentation regarding the
>visualization.
>
>I don't think visualization is a necessity for Nutch but something out of
>the box will be interesting to have.  (people love graphs)
>
>Bin