You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/06/13 14:29:29 UTC
Suitable Nutch 2.0 Project Description
Hi,
Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
about a suitable project descriptor.
So far on trunk we have
** Apache Nutch is an open source web-search software project.
Stemming from Apache Lucene, it now builds on Apache Solr adding
web-specifics, such as a crawler, a link-graph database and parsing
support handled by Apache Tika for HTML and and array other document
formats.
This is merely a pot shot, but I was thinking for Nutch 2.0, something like
** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
source web-search software project. It builds on Apache Gora for data
persistence and Apache Solr for indexing adding web-specifics, such as
a crawler, a link-graph database and parsing support handled by Apache
Tika for HTML and and array other document formats.
Although there are not many changes here I just wanted to run it by
you folks...?
Thanks
Lewis
--
Lewis
Re: Suitable Nutch 2.0 Project Description
Posted by Julien Nioche <li...@gmail.com>.
" and and array other document " looks like a typo, rest is fine
On 13 June 2012 13:45, Ferdy Galema <fe...@kalooga.com> wrote:
> Hi,
>
> I would remove the 'experimental' notion. Aside from that it's fine with
> me.
>
> Ferdy.
>
>
> On Wed, Jun 13, 2012 at 2:29 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Hi,
>>
>> Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
>> about a suitable project descriptor.
>>
>> So far on trunk we have
>>
>> ** Apache Nutch is an open source web-search software project.
>> Stemming from Apache Lucene, it now builds on Apache Solr adding
>> web-specifics, such as a crawler, a link-graph database and parsing
>> support handled by Apache Tika for HTML and and array other document
>> formats.
>>
>> This is merely a pot shot, but I was thinking for Nutch 2.0, something
>> like
>>
>> ** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
>> source web-search software project. It builds on Apache Gora for data
>> persistence and Apache Solr for indexing adding web-specifics, such as
>> a crawler, a link-graph database and parsing support handled by Apache
>> Tika for HTML and and array other document formats.
>>
>> Although there are not many changes here I just wanted to run it by
>> you folks...?
>>
>> Thanks
>> Lewis
>>
>> --
>> Lewis
>>
>
>
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble
Re: Suitable Nutch 2.0 Project Description
Posted by Ferdy Galema <fe...@kalooga.com>.
Hi,
I would remove the 'experimental' notion. Aside from that it's fine with me.
Ferdy.
On Wed, Jun 13, 2012 at 2:29 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
> Hi,
>
> Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
> about a suitable project descriptor.
>
> So far on trunk we have
>
> ** Apache Nutch is an open source web-search software project.
> Stemming from Apache Lucene, it now builds on Apache Solr adding
> web-specifics, such as a crawler, a link-graph database and parsing
> support handled by Apache Tika for HTML and and array other document
> formats.
>
> This is merely a pot shot, but I was thinking for Nutch 2.0, something like
>
> ** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
> source web-search software project. It builds on Apache Gora for data
> persistence and Apache Solr for indexing adding web-specifics, such as
> a crawler, a link-graph database and parsing support handled by Apache
> Tika for HTML and and array other document formats.
>
> Although there are not many changes here I just wanted to run it by
> you folks...?
>
> Thanks
> Lewis
>
> --
> Lewis
>
Re: Suitable Nutch 2.0 Project Description
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
+1 to the description w/o experimental too (I agree with Ferdy).
You guys ROCK.
Cheers,
Chris
On Jun 13, 2012, at 5:29 AM, Lewis John Mcgibbney wrote:
> Hi,
>
> Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
> about a suitable project descriptor.
>
> So far on trunk we have
>
> ** Apache Nutch is an open source web-search software project.
> Stemming from Apache Lucene, it now builds on Apache Solr adding
> web-specifics, such as a crawler, a link-graph database and parsing
> support handled by Apache Tika for HTML and and array other document
> formats.
>
> This is merely a pot shot, but I was thinking for Nutch 2.0, something like
>
> ** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
> source web-search software project. It builds on Apache Gora for data
> persistence and Apache Solr for indexing adding web-specifics, such as
> a crawler, a link-graph database and parsing support handled by Apache
> Tika for HTML and and array other document formats.
>
> Although there are not many changes here I just wanted to run it by
> you folks...?
>
> Thanks
> Lewis
>
> --
> Lewis
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++