You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2014/03/17 03:07:47 UTC

[ANNONCEMENT] Apache Nutch 1.8 Release

Good Evening,

The Apache Nutch PMC are pleased to announce the immediate release of
Apache Nutch v1.8.

Apache Nutch is a highly extensible and scalable open source web crawler
software project. Stemming from Apache Lucene, the project has diversified
and now comprises two codebases, namely: Nutch 1.x: A well matured,
production ready crawler. 1.x enables fine grained configuration, relying
on Apache Hadoop data structures, which are great for batch processing.
Nutch 2.x: An emerging alternative taking direct inspiration from 1.x, but
which differs in one key area; storage is abstracted away from any specific
underlying data store by using Apache Gora for handling object to
persistent mappings. This means we can implement an extremely flexibile
model/stack for storing everything (fetch time, status, content, parsed
text, outlinks, inlinks, etc.) into a number of NoSQL storage solutions.
We advise all current users and developers of the 1.X series to upgrade to
this release. Although this release includes library upgrades to Crawler
Commons 0.3 and Apache Tika 1.4, it also provides over 30 bug fixes as well
as 18 improvements. Please see the list of
changes<http://www.apache.org/dist/nutch/1.8/CHANGES.txt>for a full
breakdown, or see the release
report <http://s.apache.org/oHY>. As usual in the 1.X series, this release
is made available both as source and binary. Additionally developers can
find Maven artifacts within Maven Central <http://search.maven.org/>. The
release is available here <http://www.apache.org/dyn/closer.cgi/nutch/>.

Thank you
Lewis
(On behalf of the Nutch PMC)

-- 
*Lewis*

Re: [ANNONCEMENT] Apache Nutch 1.8 Release

Posted by Julien Nioche <li...@gmail.com>.
Thanks Lewis!


On 17 March 2014 02:07, Lewis John Mcgibbney <le...@gmail.com>wrote:

> Good Evening,
>
> The Apache Nutch PMC are pleased to announce the immediate release of
> Apache Nutch v1.8.
>
> Apache Nutch is a highly extensible and scalable open source web crawler
> software project. Stemming from Apache Lucene, the project has diversified
> and now comprises two codebases, namely: Nutch 1.x: A well matured,
> production ready crawler. 1.x enables fine grained configuration, relying
> on Apache Hadoop data structures, which are great for batch processing.
> Nutch 2.x: An emerging alternative taking direct inspiration from 1.x, but
> which differs in one key area; storage is abstracted away from any specific
> underlying data store by using Apache Gora for handling object to
> persistent mappings. This means we can implement an extremely flexibile
> model/stack for storing everything (fetch time, status, content, parsed
> text, outlinks, inlinks, etc.) into a number of NoSQL storage solutions.
> We advise all current users and developers of the 1.X series to upgrade to
> this release. Although this release includes library upgrades to Crawler
> Commons 0.3 and Apache Tika 1.4, it also provides over 30 bug fixes as well
> as 18 improvements. Please see the list of
> changes<http://www.apache.org/dist/nutch/1.8/CHANGES.txt>for a full
> breakdown, or see the release
> report <http://s.apache.org/oHY>. As usual in the 1.X series, this release
> is made available both as source and binary. Additionally developers can
> find Maven artifacts within Maven Central <http://search.maven.org/>. The
> release is available here <http://www.apache.org/dyn/closer.cgi/nutch/>.
>
> Thank you
> Lewis
> (On behalf of the Nutch PMC)
>
> --
> *Lewis*
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble