You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by le...@apache.org on 2014/06/09 21:00:41 UTC
svn commit: r1601480 - /nutch/cms_site/trunk/content/index.md
Author: lewismc
Date: Mon Jun 9 19:00:40 2014
New Revision: 1601480
URL: http://svn.apache.org/r1601480
Log:
Switch tabs to spaces
Modified:
nutch/cms_site/trunk/content/index.md
Modified: nutch/cms_site/trunk/content/index.md
URL: http://svn.apache.org/viewvc/nutch/cms_site/trunk/content/index.md?rev=1601480&r1=1601479&r2=1601480&view=diff
==============================================================================
--- nutch/cms_site/trunk/content/index.md (original)
+++ nutch/cms_site/trunk/content/index.md Mon Jun 9 19:00:40 2014
@@ -16,411 +16,409 @@ KIND, either express or implied. See th
specific language governing permissions and limitations
under the License.
-->
- <!-- Carousel
- ================================================== -->
-
- <div id="myCarousel" class="carousel slide">
- <div class="carousel-inner">
- <div class="item active">
- <img src="./assets/img/examples/all_systems_go_ahart.jpg" alt="">
- <div class="container">
- <div class="carousel-caption">
- <h1>Highly extensible, highly scalable Web crawler</h1>
- <p class="lead">Nutch is a well matured, production ready Web crawler. Nutch 1.x enables
- fine grained configuration, relying on <a href="http://hadoop.apache.org">Apache Hadoop™</a>
- data structures, which are great for batch processing.</p>
- <a class="btn btn-large btn-primary" href="downloads.html">Download</a>
- </div>
- </div>
- </div>
- <div class="item">
- <img src="./assets/img/examples/server_rack.jpg" alt="">
- <div class="container">
- <div class="carousel-caption">
- <h1>Pluggable parsing, protocols, storage and indexing</h1>
- <p class="lead">Being pluggable and modular of course has it's benefits,
- Nutch provides extensible interfaces such as Parse, Index and ScoringFilter's for custom
- implementations e.g. <a href="http://tika.apache.org">Apache Tika™</a> for parsing.
- Additonally, pluggable indexing exists for <a href="http://lucene.apache.org/solr">Apache Solr™</a>,
- <a href="http://www.elasticsearch.org/">Elastic Search</a>, <a href="https://cwiki.apache.org/confluence/display/solr/SolrCloud">SolrCloud</a>, etc.</p>
- <a class="btn btn-large btn-primary" href="http://wiki.apache.org/nutch/">Learn About</a>
- </div>
- </div>
- </div>
- <div class="item">
- <img src="./assets/img/examples/blade_servers.jpg" alt="">
- <div class="container">
- <div class="carousel-caption">
- <h1>Vibrant community, active development</h1>
- <p class="lead">Nutch 2.X branch is becoming an emerging alternative
- taking direct inspiration from 1.X. 2.X differs in one key area;
- storage is abstracted away from any specific underlying data store by using
- <a href="http://gora.apache.org">Apache Gora™</a> for handling object to persistent
- data store mappings.</p>
- <a class="btn btn-large btn-primary" href="mailing_lists.html">Join the Community</a>
- </div>
- </div>
- </div>
- <div class="item">
- <img src="./assets/img/examples/ganeti_cluster.jpg" alt="">
- <div class="container">
- <div class="carousel-caption">
- <!-- h1>Join the community...</h1-->
- <a class="twitter-timeline" href="https://twitter.com/ApacheNutch" data-widget-id="467920923970916353">Tweets by @ApacheNutch</a>
- <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
- <!-- a class="btn btn-large btn-primary" href="mailing_lists.html">Join the Community</a-->
- </div>
- </div>
- </div>
-
-
- </div>
- <a class="left carousel-control" href="#myCarousel" data-slide="prev">‹</a>
- <a class="right carousel-control" href="#myCarousel" data-slide="next">›</a>
- </div><!-- /.carousel -->
-
-
-
- <!-- Marketing messaging and featurettes
- This permits 3 featured topics to be included in
-
- ================================================== -->
- <!-- Wrap the rest of the page in another container to center all the content. -->
- <div class="container marketing">
-
- <!-- Three columns of text below the carousel -->
- <!-- div class="row">
- <div class="span4">
- <img class="img-circle" data-src="holder.js/140x140" src="">
- <h2></h2>
- <p>
- </p>
- <p><a class="btn" href=""></a></p>
- </div>
- <div class="span4">
- <img class="img-circle" data-src="holder.js/140x140" src="">
- <h2></h2>
- <p></p>
- <p><a class="btn" href=""></a>
- </p>
- </div>
- <div class="span4">
- <img class="img-circle" data-src="holder.js/140x140" src="">
- <h2></h2>
- <p></p>
- <p><a class="btn" href=""></a></p>
- </div>
- </div-->
-
- <h1>Apache Nutch News</h1>
-
- <div class="jumbotron">
- <h2>01 May 2014 - Apache Nutch Participates in <a href="https://www.google-melange.com/gsoc/homepage/google/gsoc2014">Google Summer of Code</a></h2>
- <a title="ApacheCon US 2009" href="http://www.us.apachecon.com/c/acus2009/">
- <img src="http://typo3.org/fileadmin/t3org/images/FM-news/2014/thisweek/920x156xbanner-gsoc2014.jpg" class="float-right" alt="GSoC Logo"/>
- </a>
- <p>For the first time in Nutch project history, we are participating as part of Apache's mentoring efforts in the ever popular
- <a href="https://www.google-melange.com/gsoc/homepage/google/gsoc2014">Google Summer of Code</a> program.
- This years project involves the <a href="https://issues.apache.org/jira/browse/NUTCH-841">creation of a Apache Wicket-based Web Application</a> for Nutch 2.X branch.
- </p>
- <p>Keep your eyes peeled and check here for updates as the project progresses throughout the summer.</p>
+<!-- Carousel================================================== -->
+<div id="myCarousel" class="carousel slide">
+ <div class="carousel-inner">
+ <div class="item active">
+ <img src="./assets/img/examples/all_systems_go_ahart.jpg" alt="">
+ <div class="container">
+ <div class="carousel-caption">
+ <h1>Highly extensible, highly scalable Web crawler</h1>
+ <p class="lead">Nutch is a well matured, production ready Web crawler. Nutch 1.x enables
+ fine grained configuration, relying on <a href="http://hadoop.apache.org">Apache Hadoop™</a>
+ data structures, which are great for batch processing.</p>
+ <a class="btn btn-large btn-primary" href="downloads.html">Download</a>
+ </div>
+ </div>
+ </div>
+ <div class="item">
+ <img src="./assets/img/examples/server_rack.jpg" alt="">
+ <div class="container">
+ <div class="carousel-caption">
+ <h1>Pluggable parsing, protocols, storage and indexing</h1>
+ <p class="lead">Being pluggable and modular of course has it's benefits,
+ Nutch provides extensible interfaces such as Parse, Index and ScoringFilter's for custom
+ implementations e.g. <a href="http://tika.apache.org">Apache Tika™</a> for parsing.
+ Additonally, pluggable indexing exists for <a href="http://lucene.apache.org/solr">Apache Solr™</a>,
+ <a href="http://www.elasticsearch.org/">Elastic Search</a>, <a href="https://cwiki.apache.org/confluence/display/solr/SolrCloud">SolrCloud</a>, etc.</p>
+ <a class="btn btn-large btn-primary" href="http://wiki.apache.org/nutch/">Learn About</a>
+ </div>
+ </div>
+ </div>
+ <div class="item">
+ <img src="./assets/img/examples/blade_servers.jpg" alt="">
+ <div class="container">
+ <div class="carousel-caption">
+ <h1>Vibrant community, active development</h1>
+ <p class="lead">Nutch 2.X branch is becoming an emerging alternative
+ taking direct inspiration from 1.X. 2.X differs in one key area;
+ storage is abstracted away from any specific underlying data store by using
+ <a href="http://gora.apache.org">Apache Gora™</a> for handling object to persistent
+ data store mappings.</p>
+ <a class="btn btn-large btn-primary" href="mailing_lists.html">Join the Community</a>
+ </div>
+ </div>
+ </div>
+ <div class="item">
+ <img src="./assets/img/examples/ganeti_cluster.jpg" alt="">
+ <div class="container">
+ <div class="carousel-caption">
+ <!-- h1>Join the community...</h1-->
+ <a class="twitter-timeline" href="https://twitter.com/ApacheNutch" data-widget-id="467920923970916353">Tweets by @ApacheNutch</a>
+ <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+ <!-- a class="btn btn-large btn-primary" href="mailing_lists.html">Join the Community</a-->
+ </div>
+ </div>
+ </div>
+
+
+ </div>
+ <a class="left carousel-control" href="#myCarousel" data-slide="prev">‹</a>
+ <a class="right carousel-control" href="#myCarousel" data-slide="next">›</a>
+ </div><!-- /.carousel -->
+
+
+
+ <!-- Marketing messaging and featurettes
+ This permits 3 featured topics to be included in
+
+ ================================================== -->
+ <!-- Wrap the rest of the page in another container to center all the content. -->
+ <div class="container marketing">
+
+ <!-- Three columns of text below the carousel -->
+ <!-- div class="row">
+ <div class="span4">
+ <img class="img-circle" data-src="holder.js/140x140" src="">
+ <h2></h2>
+ <p>
+ </p>
+ <p><a class="btn" href=""></a></p>
+ </div>
+ <div class="span4">
+ <img class="img-circle" data-src="holder.js/140x140" src="">
+ <h2></h2>
+ <p></p>
+ <p><a class="btn" href=""></a>
+ </p>
+ </div>
+ <div class="span4">
+ <img class="img-circle" data-src="holder.js/140x140" src="">
+ <h2></h2>
+ <p></p>
+ <p><a class="btn" href=""></a></p>
+ </div>
+ </div-->
+
+ <h1>Apache Nutch News</h1>
+
+ <div class="jumbotron">
+ <h2>01 May 2014 - Apache Nutch Participates in <a href="https://www.google-melange.com/gsoc/homepage/google/gsoc2014">Google Summer of Code</a></h2>
+ <a title="ApacheCon US 2009" href="http://www.us.apachecon.com/c/acus2009/">
+ <img src="http://typo3.org/fileadmin/t3org/images/FM-news/2014/thisweek/920x156xbanner-gsoc2014.jpg" class="float-right" alt="GSoC Logo"/>
+ </a>
+ <p>For the first time in Nutch project history, we are participating as part of Apache's mentoring efforts in the ever popular
+ <a href="https://www.google-melange.com/gsoc/homepage/google/gsoc2014">Google Summer of Code</a> program.
+ This years project involves the <a href="https://issues.apache.org/jira/browse/NUTCH-841">creation of a Apache Wicket-based Web Application</a> for Nutch 2.X branch.
+ </p>
+ <p>Keep your eyes peeled and check here for updates as the project progresses throughout the summer.</p>
</div>
<div class="jumbotron">
- <h2>07-09 April 2014 - Nutch at ApacheCon 2014, Denver Colorado</h2>
- <a title="ApacheCon NA 2014" href="http://events.linuxfoundation.org/events/apachecon-north-america">
- <img src="http://www.apache.org/events/current-event-125x125.png" class="float-right" alt="ApacheCon Logo"/>
- </a>
- <p>lots of talk and loads of exposure for this at ApacheCon NA 2014 in the beautiful city of Denver, CO.
- This year one presentation focused on <a href="http://sched.co/1pav9xl">Building your Big Data Search Stack with Apache Nutch 2.x</a>.
- You can see presentation slides below and follow the audio (sorry no video) <a href="https://www.youtube.com/watch?v=rIv3Js-zBpE">here</a></p>.
- <iframe src="http://prezi.com/embed/gkomeulfuqhh/?bgcolor=ffffff&lock_to_path=0&autoplay=0&autohide_ctrls=0&features=undefined&disabled_features=undefined" width="550" height="400" frameBorder="0" webkitAllowFullScreen mozAllowFullscreen allowfullscreen></iframe>
+ <h2>07-09 April 2014 - Nutch at ApacheCon 2014, Denver Colorado</h2>
+ <a title="ApacheCon NA 2014" href="http://events.linuxfoundation.org/events/apachecon-north-america">
+ <img src="http://www.apache.org/events/current-event-125x125.png" class="float-right" alt="ApacheCon Logo"/>
+ </a>
+ <p>lots of talk and loads of exposure for this at ApacheCon NA 2014 in the beautiful city of Denver, CO.
+ This year one presentation focused on <a href="http://sched.co/1pav9xl">Building your Big Data Search Stack with Apache Nutch 2.x</a>.
+ You can see presentation slides below and follow the audio (sorry no video) <a href="https://www.youtube.com/watch?v=rIv3Js-zBpE">here</a></p>.
+ <iframe src="http://prezi.com/embed/gkomeulfuqhh/?bgcolor=ffffff&lock_to_path=0&autoplay=0&autohide_ctrls=0&features=undefined&disabled_features=undefined" width="550" height="400" frameBorder="0" webkitAllowFullScreen mozAllowFullscreen allowfullscreen></iframe>
</div>
<div class="jumbotron">
- <h2>17 March 2014 - Apache Nutch v1.8 Released</h2>
- <p>The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.8, we advise all
- current users and developers of the 1.X series to upgrade to this release. Alhough this
- release includes library upgrades to <a href="http://code.google.com/p/crawler-commons/">Crawler Commons</a> 0.3 and
- <a href="http://tika.apache.org">Apache Tika</a> 1.5, it also provides over 30 bug fixes as well as 18 improvements.
- Please see the <a href="http://www.apache.org/dist/nutch/1.8/CHANGES.txt">list of changes</a> for a full
- breakdown, or see the <a href="http://s.apache.org/oHY">release report</a>.
- As usual in the 1.X series, this release is made available both as source and binary. Additionally developers
- can find Maven artifacts within <a href="http://search.maven.org/">Maven Central</a>.
- The release is available <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>17 March 2014 - Apache Nutch v1.8 Released</h2>
+ <p>The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.8, we advise all
+ current users and developers of the 1.X series to upgrade to this release. Alhough this
+ release includes library upgrades to <a href="http://code.google.com/p/crawler-commons/">Crawler Commons</a> 0.3 and
+ <a href="http://tika.apache.org">Apache Tika</a> 1.5, it also provides over 30 bug fixes as well as 18 improvements.
+ Please see the <a href="http://www.apache.org/dist/nutch/1.8/CHANGES.txt">list of changes</a> for a full
+ breakdown, or see the <a href="http://s.apache.org/oHY">release report</a>.
+ As usual in the 1.X series, this release is made available both as source and binary. Additionally developers
+ can find Maven artifacts within <a href="http://search.maven.org/">Maven Central</a>.
+ The release is available <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>02 July 2013 - Apache Nutch v2.2.1 Released</h2>
- <p>The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v2.2.1, we advise all
- current users and developers of the 2.X series to upgrade to this release ASAP. Although this
- release includes library upgrades to <a href="http://hadoop.apache.org">Apache Hadoop</a> 1.2.0 and
- <a href="http://tika.apache.org">Apache Tika</a> 1.3, it is predominantly a bug fix for
- <a href="https://issues.apache.org/jira/browse/NUTCH-1591">NUTCH-1591 - Incorrect conversion of ByteBuffer to String</a>.
- Please see the <a href="http://www.apache.org/dist/nutch/2.2.1/CHANGES-2.2.1.txt">list of changes</a> for a full
- breakdown, or see the <a href="http://s.apache.org/PGa">release report</a>.
- As usual in the 2.x series, this release is made available only as source, but is also available within
- <a href="http://search.maven.org/">Maven Central</a>.
- The release is available <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>02 July 2013 - Apache Nutch v2.2.1 Released</h2>
+ <p>The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v2.2.1, we advise all
+ current users and developers of the 2.X series to upgrade to this release ASAP. Although this
+ release includes library upgrades to <a href="http://hadoop.apache.org">Apache Hadoop</a> 1.2.0 and
+ <a href="http://tika.apache.org">Apache Tika</a> 1.3, it is predominantly a bug fix for
+ <a href="https://issues.apache.org/jira/browse/NUTCH-1591">NUTCH-1591 - Incorrect conversion of ByteBuffer to String</a>.
+ Please see the <a href="http://www.apache.org/dist/nutch/2.2.1/CHANGES-2.2.1.txt">list of changes</a> for a full
+ breakdown, or see the <a href="http://s.apache.org/PGa">release report</a>.
+ As usual in the 2.x series, this release is made available only as source, but is also available within
+ <a href="http://search.maven.org/">Maven Central</a>.
+ The release is available <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>24th June 2013 - Apache Nutch v1.7 Released</h2>
- <p>The Apache Nutch PMC are extremely pleased to announce the immediate release of Apache Nutch v1.7. This
- release includes over 20 bug fixes, as many improvements; most noticeably featuring a new
- <a href="https://issues.apache.org/jira/browse/NUTCH-1047">pluggable indexing architecture</a> which currently supports
- <a href="http://lucene.apache.org/solr">Apache Solr</a> and <a href="http://www.elasticsearch.org/">Elastic Search</a>.
- Shadowing the recent Nutch 2.2 release, parsing
- of Robots.txt is now delegated to <a href="http://code.google.com/p/crawler-commons/">
- Crawler-Commons</a>. Key library upgrades have been made to <a href="http://hadoop.apache.org">Apache Hadoop</a> 1.2.0
- and <a href="http://tika.apache.org">Apache Tika</a> 1.3. Please see the <a href="http://www.apache.org/dist/nutch/1.7/1.7-CHANGES.txt">list of
- changes</a> or the <a href="http://s.apache.org/1zE">release report</a> made in this version for a full
- breakdown.
- As usual in the 1.x series, the release is made available as binary and source (zip + tar.gz) and is also available within
- <a href="http://search.maven.org/">Maven Central</a>.
- The release is available <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>24th June 2013 - Apache Nutch v1.7 Released</h2>
+ <p>The Apache Nutch PMC are extremely pleased to announce the immediate release of Apache Nutch v1.7. This
+ release includes over 20 bug fixes, as many improvements; most noticeably featuring a new
+ <a href="https://issues.apache.org/jira/browse/NUTCH-1047">pluggable indexing architecture</a> which currently supports
+ <a href="http://lucene.apache.org/solr">Apache Solr</a> and <a href="http://www.elasticsearch.org/">Elastic Search</a>.
+ Shadowing the recent Nutch 2.2 release, parsing
+ of Robots.txt is now delegated to <a href="http://code.google.com/p/crawler-commons/">
+ Crawler-Commons</a>. Key library upgrades have been made to <a href="http://hadoop.apache.org">Apache Hadoop</a> 1.2.0
+ and <a href="http://tika.apache.org">Apache Tika</a> 1.3. Please see the <a href="http://www.apache.org/dist/nutch/1.7/1.7-CHANGES.txt">list of
+ changes</a> or the <a href="http://s.apache.org/1zE">release report</a> made in this version for a full
+ breakdown.
+ As usual in the 1.x series, the release is made available as binary and source (zip + tar.gz) and is also available within
+ <a href="http://search.maven.org/">Maven Central</a>.
+ The release is available <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>08 June 2013 - Apache Nutch v2.2 Released</h2>
- <p>The Apache Nutch PMC are extremely pleased to announce the immediate release of Apache Nutch v2.2. This
- release includes over 30 bug fixes and over 25 improvements representing the third release of increasingly
- popular 2.x Nutch series. This release features inclusion of <a href="http://code.google.com/p/crawler-commons/">
- Crawler-Commons</a> which Nutch now utilizes for improved robots.txt parsing, library upgrades to
- <a href="http://hadoop.apache.org">Apache Hadoop</a> 1.1.1, <a href="http://gora.apache.org">Apache Gora</a>
- 0.3, <a href="http://tika.apache.org">Apache Tika</a> 1.2 and <a href="http://www.brics.dk/automaton/automaton">
- Automaton</a> 1.11-8. Please see the <a href="http://www.apache.org/dist/nutch/2.2/2.2-CHANGES.txt">list of
- changes</a> or the <a href="http://s.apache.org/LPB">release report</a> made in this version for a full
- breakdown.
- As usual in the 2.x series, this release is made available only as source, but is also available within
- <a href="http://search.maven.org/">Maven Central</a>.
- The release is available <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>08 June 2013 - Apache Nutch v2.2 Released</h2>
+ <p>The Apache Nutch PMC are extremely pleased to announce the immediate release of Apache Nutch v2.2. This
+ release includes over 30 bug fixes and over 25 improvements representing the third release of increasingly
+ popular 2.x Nutch series. This release features inclusion of <a href="http://code.google.com/p/crawler-commons/">
+ Crawler-Commons</a> which Nutch now utilizes for improved robots.txt parsing, library upgrades to
+ <a href="http://hadoop.apache.org">Apache Hadoop</a> 1.1.1, <a href="http://gora.apache.org">Apache Gora</a>
+ 0.3, <a href="http://tika.apache.org">Apache Tika</a> 1.2 and <a href="http://www.brics.dk/automaton/automaton">
+ Automaton</a> 1.11-8. Please see the <a href="http://www.apache.org/dist/nutch/2.2/2.2-CHANGES.txt">list of
+ changes</a> or the <a href="http://s.apache.org/LPB">release report</a> made in this version for a full
+ breakdown.
+ As usual in the 2.x series, this release is made available only as source, but is also available within
+ <a href="http://search.maven.org/">Maven Central</a>.
+ The release is available <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>06 December 2012 - Apache Nutch v1.6 Released</h2>
- <p>The Apache Nutch PMC are extremely pleased to announce the release of Apache Nutch v1.6. This
- release includes over 20 bug fixes, the same in improvements, as well as new functionalities including a new HostNormalizer,
- the ability to dynamically set fetchInterval by MIME-type and functional enhancements to the Indexer API inluding the normalization
- of URL's and the deletion of robots noIndex documents. Other notable improvements include the upgrade of key dependencies to
- <a href="http://tika.apache.org/1.2/index.html">Tika 1.2</a> and <a href="http://www.brics.dk/automaton/">Automaton 1.11-8</a>.
- Please see the <a href="http://www.apache.org/dist/nutch/1.6/CHANGES_1.6.txt">list of changes</a> or the
- <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=10680&version=12319941">release report</a> made
- in this version for a full breakdown. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>06 December 2012 - Apache Nutch v1.6 Released</h2>
+ <p>The Apache Nutch PMC are extremely pleased to announce the release of Apache Nutch v1.6. This
+ release includes over 20 bug fixes, the same in improvements, as well as new functionalities including a new HostNormalizer,
+ the ability to dynamically set fetchInterval by MIME-type and functional enhancements to the Indexer API inluding the normalization
+ of URL's and the deletion of robots noIndex documents. Other notable improvements include the upgrade of key dependencies to
+ <a href="http://tika.apache.org/1.2/index.html">Tika 1.2</a> and <a href="http://www.brics.dk/automaton/">Automaton 1.11-8</a>.
+ Please see the <a href="http://www.apache.org/dist/nutch/1.6/CHANGES_1.6.txt">list of changes</a> or the
+ <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=10680&version=12319941">release report</a> made
+ in this version for a full breakdown. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>05 October 2012 - Apache Nutch v2.1 Released</h2>
- <p>The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v2.1. This
- release continues to provide Nutch users with a simplified Nutch distribution building on the 2.x
- development drive which is growing in popularity amongst the community. As well as addressing ~20 bugs
- this release also offers improved properties for better <a href="http://lucene.apache.org/solr/">Solr</a> configuration, upgrades to various <a href="http://gora.apache.org">Gora</a> dependencies and the introduction of the option to build indexes in <a href="http://www.elasticsearch.org/">elastic search</a>.
- Please see the <a href="http://www.apache.org/dist/nutch/2.1/CHANGES-2.1.txt">list of changes</a> made
- in this version for a full breakdown. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>05 October 2012 - Apache Nutch v2.1 Released</h2>
+ <p>The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v2.1. This
+ release continues to provide Nutch users with a simplified Nutch distribution building on the 2.x
+ development drive which is growing in popularity amongst the community. As well as addressing ~20 bugs
+ this release also offers improved properties for better <a href="http://lucene.apache.org/solr/">Solr</a> configuration, upgrades to various <a href="http://gora.apache.org">Gora</a> dependencies and the introduction of the option to build indexes in <a href="http://www.elasticsearch.org/">elastic search</a>.
+ Please see the <a href="http://www.apache.org/dist/nutch/2.1/CHANGES-2.1.txt">list of changes</a> made
+ in this version for a full breakdown. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
-
+
<div class="jumbotron">
- <h2>10 August 2012 - Happy 10th Birthday Apache Nutch!!</h2>
- <p>It's official, Apache Nutch is now a decade old! The project has come a long long way since inception, through <a href="#January+2005%3A+Nutch+Joins+Apache+Incubator">acceptance into the Apache Incubator</a> way back in Janurary 2005, to the <a href="#21+April+2010+-+Apache+Nutch+graduates+to+TLP">Top Level Project</a> it became on 21st April 2010. Happy birthday Nutch and thanks to all contributors past and present! See <a href="https://twitter.com/cutting/status/233415059798372353">Doug Cutting's tweet</a>.
- </p>
+ <h2>10 August 2012 - Happy 10th Birthday Apache Nutch!!</h2>
+ <p>It's official, Apache Nutch is now a decade old! The project has come a long long way since inception, through <a href="#January+2005%3A+Nutch+Joins+Apache+Incubator">acceptance into the Apache Incubator</a> way back in Janurary 2005, to the <a href="#21+April+2010+-+Apache+Nutch+graduates+to+TLP">Top Level Project</a> it became on 21st April 2010. Happy birthday Nutch and thanks to all contributors past and present! See <a href="https://twitter.com/cutting/status/233415059798372353">Doug Cutting's tweet</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>10 July 2012 - Apache Nutch v1.5.1 Released</h2>
- <p>The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v1.5.1. This release is a maintainence release of the popular 1.5.X mainstream version of Nutch which has been widely adopted within the community.
- Please see the <a href="http://www.apache.org/dist/nutch/1.5.1/CHANGES.txt">list of changes</a> made
- in this version for a full breakdown. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>10 July 2012 - Apache Nutch v1.5.1 Released</h2>
+ <p>The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v1.5.1. This release is a maintainence release of the popular 1.5.X mainstream version of Nutch which has been widely adopted within the community.
+ Please see the <a href="http://www.apache.org/dist/nutch/1.5.1/CHANGES.txt">list of changes</a> made
+ in this version for a full breakdown. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>07 July 2012 - Apache Nutch v2.0 Released</h2>
- <p>The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v2.0. This
- release offers users an edition focused on large scale crawling which builds on storage abstraction
- (via Apache Gora™) for big data stores such as Apache Accumulo™, Apache Avro™, Apache Cassandra™,
- Apache HBase™, HDFS™, an in memory data store and various high profile SQL stores. After some two
- years of development Nutch v2.0 also offers all of the mainstream Nutch functionality and it builds
- on Apache Solr™ adding web-specifics, such as a crawler, a link-graph database and parsing
- support handled by Apache Tika™ for HTML and an array other document formats. Nutch v2.0 shadows
- the latest stable mainstream release (v1.5.X) based on Apache Hadoop™ and covers many use cases
- from small crawls on a single machine to large scale deployments on Hadoop clusters.
- Please see the <a href="http://www.apache.org/dist/nutch/2.0/CHANGES.txt">list of changes</a> made
- in this version for a full breakdown. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>07 July 2012 - Apache Nutch v2.0 Released</h2>
+ <p>The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v2.0. This
+ release offers users an edition focused on large scale crawling which builds on storage abstraction
+ (via Apache Gora™) for big data stores such as Apache Accumulo™, Apache Avro™, Apache Cassandra™,
+ Apache HBase™, HDFS™, an in memory data store and various high profile SQL stores. After some two
+ years of development Nutch v2.0 also offers all of the mainstream Nutch functionality and it builds
+ on Apache Solr™ adding web-specifics, such as a crawler, a link-graph database and parsing
+ support handled by Apache Tika™ for HTML and an array other document formats. Nutch v2.0 shadows
+ the latest stable mainstream release (v1.5.X) based on Apache Hadoop™ and covers many use cases
+ from small crawls on a single machine to large scale deployments on Hadoop clusters.
+ Please see the <a href="http://www.apache.org/dist/nutch/2.0/CHANGES.txt">list of changes</a> made
+ in this version for a full breakdown. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>07 June 2012 - Apache Nutch 1.5 Released</h2>
- <p>The 1.5 release of Nutch is now available. This release includes several improvements
- including upgrades of several major components including Tika 1.1 and Hadoop 1.0.0, improvements to LinkRank and WebGraph elements as well as a number of new plugins covering blacklisting, filering and parsing to name a few. Please see the <a href="http://www.apache.org/dist/nutch/CHANGES-1.5.txt">
- list of changes</a> made in this version for a full breakdown of the 50 odd improvements the release boasts. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>07 June 2012 - Apache Nutch 1.5 Released</h2>
+ <p>The 1.5 release of Nutch is now available. This release includes several improvements
+ including upgrades of several major components including Tika 1.1 and Hadoop 1.0.0, improvements to LinkRank and WebGraph elements as well as a number of new plugins covering blacklisting, filering and parsing to name a few. Please see the <a href="http://www.apache.org/dist/nutch/CHANGES-1.5.txt">
+ list of changes</a> made in this version for a full breakdown of the 50 odd improvements the release boasts. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>26 November 2011 - Apache Nutch 1.4 Released</h2>
- <p>The 1.4 release of Nutch is now available. This release includes several improvements
- including allowing Parsers to declare support for multiple MIME types, configurable Fetcher
- Queue depth, Fetcher speed improvements, tigther Tika integration, and support for HTTP auth in
- Solr indexing. Please see the <a href="http://www.apache.org/dist/nutch/CHANGES-1.4.txt">
- list of changes</a> made in this version. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>26 November 2011 - Apache Nutch 1.4 Released</h2>
+ <p>The 1.4 release of Nutch is now available. This release includes several improvements
+ including allowing Parsers to declare support for multiple MIME types, configurable Fetcher
+ Queue depth, Fetcher speed improvements, tigther Tika integration, and support for HTTP auth in
+ Solr indexing. Please see the <a href="http://www.apache.org/dist/nutch/CHANGES-1.4.txt">
+ list of changes</a> made in this version. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>23 September 2011 - Apache Nutch focuses on 1.x series for main development</h2>
- <p>After some <a href="http://www.mail-archive.com/dev@nutch.apache.org/msg03581.html">discussion</a>
- and a <a href="http://www.mail-archive.com/dev@nutch.apache.org/msg04348.html">vote</a> about the
- issue, the Nutch development community decided to focus their efforts on maintaining and releasing
- the 1.x series of Nutch, and to branch the now former Nutch trunk based on Gora, allowing others to
- try and improve it, while the mainline development goes on.
- </p>
+ <h2>23 September 2011 - Apache Nutch focuses on 1.x series for main development</h2>
+ <p>After some <a href="http://www.mail-archive.com/dev@nutch.apache.org/msg03581.html">discussion</a>
+ and a <a href="http://www.mail-archive.com/dev@nutch.apache.org/msg04348.html">vote</a> about the
+ issue, the Nutch development community decided to focus their efforts on maintaining and releasing
+ the 1.x series of Nutch, and to branch the now former Nutch trunk based on Gora, allowing others to
+ try and improve it, while the mainline development goes on.
+ </p>
</div>
<div class="jumbotron">
- <h2>7 June 2011 - Apache Nutch 1.3 Released</h2>
- <p>The 1.3 release of Nutch is now available. This release includes several improvements
- (improved RSS parsing support, tighter integration with Apache Tika, external parsing support,
- improved language identification and an order of magnitude smaller source release tarball -- only
- about 2MB!). Please see the <a href="http://www.apache.org/dist/nutch/CHANGES-1.3.txt">
- list of changes</a> made in this version. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>7 June 2011 - Apache Nutch 1.3 Released</h2>
+ <p>The 1.3 release of Nutch is now available. This release includes several improvements
+ (improved RSS parsing support, tighter integration with Apache Tika, external parsing support,
+ improved language identification and an order of magnitude smaller source release tarball -- only
+ about 2MB!). Please see the <a href="http://www.apache.org/dist/nutch/CHANGES-1.3.txt">
+ list of changes</a> made in this version. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>24 September 2010 - Apache Nutch 1.2 Released</h2>
- <p>The 1.2 release of Nutch is now available. This release includes several improvements (addition
- of parse-html as a selectable parser again, configurable per-field indexing),
- new features (including adding timing information to all Tool classes, and implementation of
- parser timeouts), and bug fixes (fixing an NPE in distributed search, fixing of XML formatting
- issues per Document fields). Please see the <a href="http://www.apache.org/dist/nutch/CHANGES-1.2.txt">
- list of changes</a> made in this version. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
- </p>
+ <h2>24 September 2010 - Apache Nutch 1.2 Released</h2>
+ <p>The 1.2 release of Nutch is now available. This release includes several improvements (addition
+ of parse-html as a selectable parser again, configurable per-field indexing),
+ new features (including adding timing information to all Tool classes, and implementation of
+ parser timeouts), and bug fixes (fixing an NPE in distributed search, fixing of XML formatting
+ issues per Document fields). Please see the <a href="http://www.apache.org/dist/nutch/CHANGES-1.2.txt">
+ list of changes</a> made in this version. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ </p>
</div>
<div class="jumbotron">
- <h2>06 June 2010 - Apache Nutch 1.1 Released</h2>
- <p>The 1.1 release of Nutch is now available. This release includes several major upgrades of existing
- libraries (Hadoop, Solr, Tika, etc.) on which Nutch depends. Various bug fixes, and speedups (e.g., to
- Fetcher2) have also been included. See <a href="http://www.apache.org/dist/nutch/CHANGES-1.1.txt">
- list of changes</a> made in this version. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
- </div>
-
- <div class="jumbotron">
- <h2>21 April 2010 - Apache Nutch graduates to TLP</h2>
- <p><a href="http://www.apache.org/foundation/records/minutes/2010/board_minutes_2010_04_21.txt">Passed by
- unanimous approval of the Apache Board</a>, Nutch graduated to TLP status. We are in the process of updating
- the website, and moving things around, so if you notice anything out of place, <a href="./mailing_lists.html">
- please let us know.</a></p>
- </div>
-
- <div class="jumbotron">
- <h2>14 August 2009 - Lucene at US ApacheCon</h2>
- <p>
- <a title="ApacheCon US 2009" href="http://www.us.apachecon.com/c/acus2009/">
- <img
- src="http://www.apache.org/events/current-event-125x125.png"
- class="float-right" alt="ApacheCon Logo"/>
- </a>
- ApacheCon US is once again in the Bay Area and Lucene is coming
- along for the ride! The Lucene community has planned two full
- days of talks, plus a meetup and the usual bevy of training.
- With a well-balanced mix of first time and veteran ApacheCon
- speakers, the
- <a href="http://www.us.apachecon.com/c/acus2009/schedule#lucene">Lucene track</a>
- at ApacheCon US promises to have something for everyone. Be sure
- not to miss:
- </p>
- <p> Training:</p>
- <ul>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/437">Lucene Boot Camp</a>
- - A two day training session, Nov. 2nd & 3rd
- </li>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/375">Solr Day</a>
- - A one day training session, Nov. 2nd
- </li>
- </ul>
- <p>Thursday, Nov. 5th</p>
- <ul>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/428">Introduction to the Lucene Ecosystem
- </a>
- - Grant Ingersoll @ 9:00
- </li>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/461">Lucene Basics and New Features</a>
- - Michael Busch @ 10:00
- </li>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/331">Apache Solr: Out of the Box</a>
- - Chris Hostetter @ 14:00
- </li>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/427">Introduction to Nutch</a>
- - Andrzej Bialecki @ 15:00
- </li>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/430">Lucene and Solr Performance Tuning</a>
- - Mark Miller @ 16:30
- </li>
- </ul>
- <p>Friday, Nov. 6th</p>
- <ul>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/332">Implementing an Information Retrieval
- Framework for an Organizational Repository</a>
- - Sithu D Sudarsan @ 9:00
- </li>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/333">Apache Mahout - Going from raw data to
- Information</a>
- - Isabel Drost @ 10:00
- </li>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/334">MIME Magic with Apache Tika</a>
- - Jukka Zitting @ 11:30
- </li>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/335">Building Intelligent Search Applications
- with the Lucene Ecosystem</a>
- - Ted Dunning @ 14:00
- </li>
- <li>
- <a href="http://www.us.apachecon.com/c/acus2009/sessions/462">Realtime Search</a>
- - Jason Rutherglen @ 15:00
- </li>
- </ul>
- </div>
-
-
- <div class="jumbotron">
- <h2>23 March 2009 - Apache Nutch 1.0 Released</h2>
- <p>The 1.0 release of Nutch is now available. This release includes several major feature improvements
- such as new indexing framework, new scoring framework, Apache Solr integration just to mention a few.
- See <a href="http://www.apache.org/dist/nutch/CHANGES-1.0.txt">
- list of changes</a> made in this version. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
- <</div>
+ <h2>06 June 2010 - Apache Nutch 1.1 Released</h2>
+ <p>The 1.1 release of Nutch is now available. This release includes several major upgrades of existing
+ libraries (Hadoop, Solr, Tika, etc.) on which Nutch depends. Various bug fixes, and speedups (e.g., to
+ Fetcher2) have also been included. See <a href="http://www.apache.org/dist/nutch/CHANGES-1.1.txt">
+ list of changes</a> made in this version. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
+ </div>
+
+ <div class="jumbotron">
+ <h2>21 April 2010 - Apache Nutch graduates to TLP</h2>
+ <p><a href="http://www.apache.org/foundation/records/minutes/2010/board_minutes_2010_04_21.txt">Passed by
+ unanimous approval of the Apache Board</a>, Nutch graduated to TLP status. We are in the process of updating
+ the website, and moving things around, so if you notice anything out of place, <a href="./mailing_lists.html">
+ please let us know.</a></p>
+ </div>
+
+ <div class="jumbotron">
+ <h2>14 August 2009 - Lucene at US ApacheCon</h2>
+ <p>
+ <a title="ApacheCon US 2009" href="http://www.us.apachecon.com/c/acus2009/">
+ <img
+ src="http://www.apache.org/events/current-event-125x125.png"
+ class="float-right" alt="ApacheCon Logo"/>
+ </a>
+ ApacheCon US is once again in the Bay Area and Lucene is coming
+ along for the ride! The Lucene community has planned two full
+ days of talks, plus a meetup and the usual bevy of training.
+ With a well-balanced mix of first time and veteran ApacheCon
+ speakers, the
+ <a href="http://www.us.apachecon.com/c/acus2009/schedule#lucene">Lucene track</a>
+ at ApacheCon US promises to have something for everyone. Be sure
+ not to miss:
+ </p>
+ <p> Training:</p>
+ <ul>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/437">Lucene Boot Camp</a>
+ - A two day training session, Nov. 2nd & 3rd
+ </li>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/375">Solr Day</a>
+ - A one day training session, Nov. 2nd
+ </li>
+ </ul>
+ <p>Thursday, Nov. 5th</p>
+ <ul>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/428">Introduction to the Lucene Ecosystem
+ </a>
+ - Grant Ingersoll @ 9:00
+ </li>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/461">Lucene Basics and New Features</a>
+ - Michael Busch @ 10:00
+ </li>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/331">Apache Solr: Out of the Box</a>
+ - Chris Hostetter @ 14:00
+ </li>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/427">Introduction to Nutch</a>
+ - Andrzej Bialecki @ 15:00
+ </li>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/430">Lucene and Solr Performance Tuning</a>
+ - Mark Miller @ 16:30
+ </li>
+ </ul>
+ <p>Friday, Nov. 6th</p>
+ <ul>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/332">Implementing an Information Retrieval
+ Framework for an Organizational Repository</a>
+ - Sithu D Sudarsan @ 9:00
+ </li>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/333">Apache Mahout - Going from raw data to
+ Information</a>
+ - Isabel Drost @ 10:00
+ </li>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/334">MIME Magic with Apache Tika</a>
+ - Jukka Zitting @ 11:30
+ </li>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/335">Building Intelligent Search Applications
+ with the Lucene Ecosystem</a>
+ - Ted Dunning @ 14:00
+ </li>
+ <li>
+ <a href="http://www.us.apachecon.com/c/acus2009/sessions/462">Realtime Search</a>
+ - Jason Rutherglen @ 15:00
+ </li>
+ </ul>
+ </div>
+
+
+ <div class="jumbotron">
+ <h2>23 March 2009 - Apache Nutch 1.0 Released</h2>
+ <p>The 1.0 release of Nutch is now available. This release includes several major feature improvements
+ such as new indexing framework, new scoring framework, Apache Solr integration just to mention a few.
+ See <a href="http://www.apache.org/dist/nutch/CHANGES-1.0.txt">
+ list of changes</a> made in this version. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
+ <</div>
<div class="jumbotron">
<h2>09 February 2009 - Lucene at ApacheCon Europe 2009 in
@@ -440,138 +438,138 @@ under the License.
<li>
<a href="http://eu.apachecon.com/c/aceu2009/sessions/197">Lucene Boot Camp</a>
- A two day training session, March 23 & 24th</li>
- <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/201">Solr Boot Camp</a> - A one day training session, March 24th</li>
- <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/136">Introducing Apache Mahout</a> - Grant Ingersoll. March 25th @ 10:30</li>
- <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/137">Lucene/Solr Case Studies</a> - Erik Hatcher. March 25th @ 11:30</li>
- <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/138">Advanced Indexing Techniques with Apache Lucene</a> - Michael Busch. March 25th @ 14:00</li>
- <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/251">Apache Solr - A Case Study</a> - Uri Boness. March 26th @ 17:30</li>
- <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/250">Best of breed - httpd, forrest, solr and droids</a> - Thorsten Scherler. March 27th @ 17:30</li>
- <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/165">Apache Droids - an intelligent standalone robot framework</a> - Thorsten Scherler. March 26th @ 15:00</li>
-
- </ul>
- </div>
-
- <div class="jumbotron">
- <h2>2 April 2007: Nutch 0.9 Released</h2>
- <p>The 0.9 release of Nutch is now available. This is the second release of Nutch
- based entirely on the underlying Hadoop platform. This release includes several critical
- bug fixes, as well as key speedups described in more detail at
- <a href="http://blog.foofactory.fi/2007/03/twice-speed-half-size.html">Sami Siren's blog</a>.
- See <a href="http://www.apache.org/dist/nutch/CHANGES-0.9.txt">
- list of changes</a> made in this version. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
- </div>
-
- <div class="jumbotron">
- <h2>24 September 2006: Nutch 0.8.1 Released</h2>
- <p>The 0.8.1 release of Nutch is now available. This is a maintenance release to 0.8 branch fixing many serous bugs found in version 0.8.
- See <a href="http://www.apache.org/dist/nutch/CHANGES-0.8.1.txt">
- list of changes</a> made in this version. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
- <div class="jumbotron">
-
- <div class="jumbotron">
- <h2>25 July 2006: Nutch 0.8 Released</h2>
- <p>The 0.8 release of Nutch is now available. This is the first release of Nutch based on
- hadoop architecure. See <a href="http://svn.apache.org/viewvc/nutch/tags/release-0.8/CHANGES.txt?view=markup">
- CHANGES.txt</a> for list of changes made in this version. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
- </div>
-
- <div class="jumbotron">
- <h2>31 March 2006: Nutch 0.7.2 Released</h2>
- <p>The 0.7.2 release of Nutch is now available. This is a bug fix release for 0.7 branch. See
- <a href="http://svn.apache.org/viewcvs.cgi/nutch/branches/branch-0.7/CHANGES.txt?rev=390158">
- CHANGES.txt</a> for details. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
- <div class="jumbotron">
+ <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/201">Solr Boot Camp</a> - A one day training session, March 24th</li>
+ <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/136">Introducing Apache Mahout</a> - Grant Ingersoll. March 25th @ 10:30</li>
+ <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/137">Lucene/Solr Case Studies</a> - Erik Hatcher. March 25th @ 11:30</li>
+ <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/138">Advanced Indexing Techniques with Apache Lucene</a> - Michael Busch. March 25th @ 14:00</li>
+ <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/251">Apache Solr - A Case Study</a> - Uri Boness. March 26th @ 17:30</li>
+ <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/250">Best of breed - httpd, forrest, solr and droids</a> - Thorsten Scherler. March 27th @ 17:30</li>
+ <li><a href="http://eu.apachecon.com/c/aceu2009/sessions/165">Apache Droids - an intelligent standalone robot framework</a> - Thorsten Scherler. March 26th @ 15:00</li>
+
+ </ul>
+ </div>
+
+ <div class="jumbotron">
+ <h2>2 April 2007: Nutch 0.9 Released</h2>
+ <p>The 0.9 release of Nutch is now available. This is the second release of Nutch
+ based entirely on the underlying Hadoop platform. This release includes several critical
+ bug fixes, as well as key speedups described in more detail at
+ <a href="http://blog.foofactory.fi/2007/03/twice-speed-half-size.html">Sami Siren's blog</a>.
+ See <a href="http://www.apache.org/dist/nutch/CHANGES-0.9.txt">
+ list of changes</a> made in this version. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
+ </div>
+
+ <div class="jumbotron">
+ <h2>24 September 2006: Nutch 0.8.1 Released</h2>
+ <p>The 0.8.1 release of Nutch is now available. This is a maintenance release to 0.8 branch fixing many serous bugs found in version 0.8.
+ See <a href="http://www.apache.org/dist/nutch/CHANGES-0.8.1.txt">
+ list of changes</a> made in this version. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
+ <div class="jumbotron">
+
+ <div class="jumbotron">
+ <h2>25 July 2006: Nutch 0.8 Released</h2>
+ <p>The 0.8 release of Nutch is now available. This is the first release of Nutch based on
+ hadoop architecure. See <a href="http://svn.apache.org/viewvc/nutch/tags/release-0.8/CHANGES.txt?view=markup">
+ CHANGES.txt</a> for list of changes made in this version. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
+ </div>
+
+ <div class="jumbotron">
+ <h2>31 March 2006: Nutch 0.7.2 Released</h2>
+ <p>The 0.7.2 release of Nutch is now available. This is a bug fix release for 0.7 branch. See
+ <a href="http://svn.apache.org/viewcvs.cgi/nutch/branches/branch-0.7/CHANGES.txt?rev=390158">
+ CHANGES.txt</a> for details. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
+ <div class="jumbotron">
- <div class="jumbotron">
- <h2>1 October 2005: Nutch 0.7.1 Released</h2>
- <p>The 0.7.1 release of Nutch is now available. This is a bug fix release. See
- <a href="http://svn.apache.org/viewcvs.cgi/nutch/branches/branch-0.7/CHANGES.txt?rev=292986">
- CHANGES.txt</a> for details. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
- </div>
-
- <div class="jumbotron">
- <h2>17 August 2005: Nutch 0.7 Released</h2>
- <p>This is the first Nutch release as an Apache Lucene sub-project. See
- <a href="http://svn.apache.org/viewcvs.cgi/nutch/trunk/CHANGES.txt?rev=233150">
- CHANGES.txt</a> for details. The release is available
- <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
- </div>
-
-
- <div class="jumbotron">
- <h2>June 2005: Nutch graduates from Incubator</h2>
-
- <p>Nutch has now graduated from the Apache incubator, and is now
- a Subproject of Lucene.</p>
-
- </div>
-
- <div class="jumbotron">
- <h2>January 2005: Nutch Joins Apache Incubator</h2>
-
- <p>Nutch is a two-year-old open source project, previously
- hosted at Sourceforge and backed by its own non-profit
- organization. The non-profit was founded in order to assign
- copyright, so that we could retain the right to change the
- license. We have now determined that the Apache license is the
- appropriate license for Nutch and no longer require the
- overhead of an independent non-profit organization. Nutch's
- board of directors and its developers were both polled and
- supported the move to the Apache foundation.</p>
-
- </div>
-
- <div class="jumbotron">
- <h2>September 2004: Creative Commons launches Nutch-based Search</h2>
-
- <p>Creative Commons unveiled a beta version of its search
- engine, which scours the web for text, images, audio, and video
- free to re-use on certain terms a search refinement offered by
- no other company or organization.</p>
-
- <p>See the <a
- href="http://creativecommons.org/press-releases/entry/5064">Creative
- Commons Press Release</a> for more details.</p>
-
- </div>
-
- <div class="jumbotron">
- <h2>September 2004: Oregon State University switches to Nutch</h2>
-
- <p>Oregon State University is converting its searching
- infrastructure from Googletm to the open source project
- Nutch. The effort to replace the Googletm will realize
- significant cost savings for Oregon State University, while
- promoting both the Nutch Search Engine and transparency in
- search engine use and management.</p>
-
- <p>For more details see the announcement by OSU's <a
- href="http://osuosl.org/news_folder/nutch">Open Source
- Lab</a>.</p>
-
- </div>
-
- <!-- Le javascript
- ================================================== -->
- <!-- Placed at the end of the document so the pages load faster -->
- <!-- ShareThis Scripts -->
- <script type="text/javascript">stLight.options({publisher: "4059fafd-3891-49f9-8c96-e4100290d8e6", doNotHash: false, doNotCopy: false, hashAddressBar: false});</script>
- <script>
- var options={ "publisher": "4059fafd-3891-49f9-8c96-e4100290d8e6", "scrollpx": 50, "ad": { "visible": false}, "chicklets": { "items": ["sharethis", "facebook", "twitter", "googleplus", "wordpress", "tumblr", "stumbleupon", "reddit", "delicious", "blogger", "linkedin", "pinterest", "email"]}};
- var st_pulldown_widget = new sharethis.widgets.pulldownbar(options);
- </script>
- <!-- End of ShareThis -->
- <script>
- !function ($) {
- $(function(){
- // carousel demo
- $('#myCarousel').carousel()
- })
- }(window.jQuery)
- </script>
- <script src="./assets/js/holder/holder.js"></script>
+ <div class="jumbotron">
+ <h2>1 October 2005: Nutch 0.7.1 Released</h2>
+ <p>The 0.7.1 release of Nutch is now available. This is a bug fix release. See
+ <a href="http://svn.apache.org/viewcvs.cgi/nutch/branches/branch-0.7/CHANGES.txt?rev=292986">
+ CHANGES.txt</a> for details. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
+ </div>
+
+ <div class="jumbotron">
+ <h2>17 August 2005: Nutch 0.7 Released</h2>
+ <p>This is the first Nutch release as an Apache Lucene sub-project. See
+ <a href="http://svn.apache.org/viewcvs.cgi/nutch/trunk/CHANGES.txt?rev=233150">
+ CHANGES.txt</a> for details. The release is available
+ <a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.</p>
+ </div>
+
+
+ <div class="jumbotron">
+ <h2>June 2005: Nutch graduates from Incubator</h2>
+
+ <p>Nutch has now graduated from the Apache incubator, and is now
+ a Subproject of Lucene.</p>
+
+ </div>
+
+ <div class="jumbotron">
+ <h2>January 2005: Nutch Joins Apache Incubator</h2>
+
+ <p>Nutch is a two-year-old open source project, previously
+ hosted at Sourceforge and backed by its own non-profit
+ organization. The non-profit was founded in order to assign
+ copyright, so that we could retain the right to change the
+ license. We have now determined that the Apache license is the
+ appropriate license for Nutch and no longer require the
+ overhead of an independent non-profit organization. Nutch's
+ board of directors and its developers were both polled and
+ supported the move to the Apache foundation.</p>
+
+ </div>
+
+ <div class="jumbotron">
+ <h2>September 2004: Creative Commons launches Nutch-based Search</h2>
+
+ <p>Creative Commons unveiled a beta version of its search
+ engine, which scours the web for text, images, audio, and video
+ free to re-use on certain terms a search refinement offered by
+ no other company or organization.</p>
+
+ <p>See the <a
+ href="http://creativecommons.org/press-releases/entry/5064">Creative
+ Commons Press Release</a> for more details.</p>
+
+ </div>
+
+ <div class="jumbotron">
+ <h2>September 2004: Oregon State University switches to Nutch</h2>
+
+ <p>Oregon State University is converting its searching
+ infrastructure from Googletm to the open source project
+ Nutch. The effort to replace the Googletm will realize
+ significant cost savings for Oregon State University, while
+ promoting both the Nutch Search Engine and transparency in
+ search engine use and management.</p>
+
+ <p>For more details see the announcement by OSU's <a
+ href="http://osuosl.org/news_folder/nutch">Open Source
+ Lab</a>.</p>
+
+ </div>
+
+ <!-- Le javascript
+ ================================================== -->
+ <!-- Placed at the end of the document so the pages load faster -->
+ <!-- ShareThis Scripts -->
+ <script type="text/javascript">stLight.options({publisher: "4059fafd-3891-49f9-8c96-e4100290d8e6", doNotHash: false, doNotCopy: false, hashAddressBar: false});</script>
+ <script>
+ var options={ "publisher": "4059fafd-3891-49f9-8c96-e4100290d8e6", "scrollpx": 50, "ad": { "visible": false}, "chicklets": { "items": ["sharethis", "facebook", "twitter", "googleplus", "wordpress", "tumblr", "stumbleupon", "reddit", "delicious", "blogger", "linkedin", "pinterest", "email"]}};
+ var st_pulldown_widget = new sharethis.widgets.pulldownbar(options);
+ </script>
+ <!-- End of ShareThis -->
+ <script>
+ !function ($) {
+ $(function(){
+ // carousel demo
+ $('#myCarousel').carousel()
+ })
+ }(window.jQuery)
+ </script>
+ <script src="./assets/js/holder/holder.js"></script>