You are viewing a plain text version of this content. The canonical link for it is here.

Posted to announce@apache.org by lewis john mcgibbney <le...@apache.org> on 2013/06/25 00:54:09 UTC

[ANNOUNCE] Apache Nutch v1.7 Released

Hi All,

The Apache Nutch PMC are extremely pleased to announce the immediate
release of Apache Nutch v1.7.

Apache Nutch is an open source web-search software project. Stemming
from Apache
Lucene <http://lucene.apache.org/java/>, it now builds on Apache
Solr<http://lucene.apache.org/solr/>adding web-specifics, such as a
crawler, a link-graph database and parsing
support handled by Apache Tika <http://tika.apache.org/> for HTML and and
array other document formats.
This release includes over 20 bug fixes, as many improvements; most
noticeably featuring a new pluggable indexing
architecture<https://issues.apache.org/jira/browse/NUTCH-1047>which
currently supports Apache
Solr <http://lucene.apache.org/solr> and Elastic
Search<http://www.elasticsearch.org/>.
Shadowing the recent Nutch 2.2 release, parsing of Robots.txt is now
delegated to Crawler-Commons <http://code.google.com/p/crawler-commons/>.
Key library upgrades have been made to Apache
Hadoop<http://hadoop.apache.org>1.2.0 and Apache
Tika <http://tika.apache.org> 1.3. Please see the list of
changes<http://www.apache.org/dist/nutch/1.7/1.7-CHANGES.txt>or the
release
report <http://s.apache.org/1zE> made in this version for a full breakdown.
As usual in the 1.x series, the release is made available as binary and
source (zip + tar.gz) and is also available within Maven
Central<http://search.maven.org/>.
The release is available here <http://www.apache.org/dyn/closer.cgi/nutch/>.


Happy crawling
lewismc
(on behalf of the Apache Nutch PMC)