You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/06/13 14:29:29 UTC

Suitable Nutch 2.0 Project Description

Hi,

Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
about a suitable project descriptor.

So far on trunk we have

** Apache Nutch is an open source web-search software project.
Stemming from Apache Lucene, it now builds on Apache Solr adding
web-specifics, such as a crawler, a link-graph database and parsing
support handled by Apache Tika for HTML and and array other document
formats.

This is merely a pot shot, but I was thinking for Nutch 2.0, something like

** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
source web-search software project. It builds on Apache Gora for data
persistence and Apache Solr for indexing adding web-specifics, such as
a crawler, a link-graph database and parsing support handled by Apache
Tika for HTML and and array other document formats.

Although there are not many changes here I just wanted to run it by
you folks...?

Thanks
Lewis

-- 
Lewis

Re: Suitable Nutch 2.0 Project Description

Posted by Julien Nioche <li...@gmail.com>.
" and and array other document " looks like a typo, rest is fine

On 13 June 2012 13:45, Ferdy Galema <fe...@kalooga.com> wrote:

> Hi,
>
> I would remove the 'experimental' notion. Aside from that it's fine with
> me.
>
> Ferdy.
>
>
> On Wed, Jun 13, 2012 at 2:29 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Hi,
>>
>> Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
>> about a suitable project descriptor.
>>
>> So far on trunk we have
>>
>> ** Apache Nutch is an open source web-search software project.
>> Stemming from Apache Lucene, it now builds on Apache Solr adding
>> web-specifics, such as a crawler, a link-graph database and parsing
>> support handled by Apache Tika for HTML and and array other document
>> formats.
>>
>> This is merely a pot shot, but I was thinking for Nutch 2.0, something
>> like
>>
>> ** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
>> source web-search software project. It builds on Apache Gora for data
>> persistence and Apache Solr for indexing adding web-specifics, such as
>> a crawler, a link-graph database and parsing support handled by Apache
>> Tika for HTML and and array other document formats.
>>
>> Although there are not many changes here I just wanted to run it by
>> you folks...?
>>
>> Thanks
>> Lewis
>>
>> --
>> Lewis
>>
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Suitable Nutch 2.0 Project Description

Posted by Ferdy Galema <fe...@kalooga.com>.
Hi,

I would remove the 'experimental' notion. Aside from that it's fine with me.

Ferdy.

On Wed, Jun 13, 2012 at 2:29 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi,
>
> Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
> about a suitable project descriptor.
>
> So far on trunk we have
>
> ** Apache Nutch is an open source web-search software project.
> Stemming from Apache Lucene, it now builds on Apache Solr adding
> web-specifics, such as a crawler, a link-graph database and parsing
> support handled by Apache Tika for HTML and and array other document
> formats.
>
> This is merely a pot shot, but I was thinking for Nutch 2.0, something like
>
> ** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
> source web-search software project. It builds on Apache Gora for data
> persistence and Apache Solr for indexing adding web-specifics, such as
> a crawler, a link-graph database and parsing support handled by Apache
> Tika for HTML and and array other document formats.
>
> Although there are not many changes here I just wanted to run it by
> you folks...?
>
> Thanks
> Lewis
>
> --
> Lewis
>

Re: Suitable Nutch 2.0 Project Description

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
+1 to the description w/o experimental too (I agree with Ferdy).

You guys ROCK.

Cheers,
Chris

On Jun 13, 2012, at 5:29 AM, Lewis John Mcgibbney wrote:

> Hi,
> 
> Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
> about a suitable project descriptor.
> 
> So far on trunk we have
> 
> ** Apache Nutch is an open source web-search software project.
> Stemming from Apache Lucene, it now builds on Apache Solr adding
> web-specifics, such as a crawler, a link-graph database and parsing
> support handled by Apache Tika for HTML and and array other document
> formats.
> 
> This is merely a pot shot, but I was thinking for Nutch 2.0, something like
> 
> ** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
> source web-search software project. It builds on Apache Gora for data
> persistence and Apache Solr for indexing adding web-specifics, such as
> a crawler, a link-graph database and parsing support handled by Apache
> Tika for HTML and and array other document formats.
> 
> Although there are not many changes here I just wanted to run it by
> you folks...?
> 
> Thanks
> Lewis
> 
> -- 
> Lewis


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++