You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2013/07/02 18:32:00 UTC

[ANNOUNCE] Apache Nutch v2.2.1 Released

Good Afternoon Everyone,

The Apache Nutch PMC are very pleased to announce the immediate release of
Apache Nutch v2.2.1, we advise all current users and developers of the 2.X
series to upgrade to this release ASAP.

Apache Nutch is an open source web-search software project. Stemming
from Apache
Lucene <http://lucene.apache.org/java/>, it now builds on Apache
Solr<http://lucene.apache.org/solr/>adding web-specifics, such as a
crawler, a link-graph database and parsing
support handled by Apache Tika <http://tika.apache.org/> for HTML and and
array other document formats.

Although this release includes library upgrades to Apache
Hadoop<http://hadoop.apache.org>1.2.0 and Apache
Tika <http://tika.apache.org> 1.3, it is predominantly a bug fix for NUTCH-1591
- Incorrect conversion of ByteBuffer to
String<https://issues.apache.org/jira/browse/NUTCH-1591>.
Please see the list of
changes<http://www.apache.org/dist/nutch/2.2.1/2.2.1-CHANGES.txt>for a
full breakdown. As usual in the 2.x series, this release is made
available only as source, but is also available within Maven
Central<http://search.maven.org/>.
The release is available here <http://www.apache.org/dyn/closer.cgi/nutch/>.


Have a great day

Best
lewismc
(on behalf of the Apache Nutch community)

-- 
*Lewis*

Re: [ANNOUNCE] Apache Nutch v2.2.1 Released

Posted by Julien Nioche <li...@gmail.com>.
Great stuff! Thanks Lewis


On 2 July 2013 17:32, Lewis John Mcgibbney <le...@gmail.com>wrote:

> Good Afternoon Everyone,
>
> The Apache Nutch PMC are very pleased to announce the immediate release of
> Apache Nutch v2.2.1, we advise all current users and developers of the 2.X
> series to upgrade to this release ASAP.
>
> Apache Nutch is an open source web-search software project. Stemming
> from Apache
> Lucene <http://lucene.apache.org/java/>, it now builds on Apache
> Solr<http://lucene.apache.org/solr/>adding web-specifics, such as a
> crawler, a link-graph database and parsing
> support handled by Apache Tika <http://tika.apache.org/> for HTML and and
> array other document formats.
>
> Although this release includes library upgrades to Apache
> Hadoop<http://hadoop.apache.org>1.2.0 and Apache
> Tika <http://tika.apache.org> 1.3, it is predominantly a bug fix for
> NUTCH-1591
> - Incorrect conversion of ByteBuffer to
> String<https://issues.apache.org/jira/browse/NUTCH-1591>.
> Please see the list of
> changes<http://www.apache.org/dist/nutch/2.2.1/2.2.1-CHANGES.txt>for a
> full breakdown. As usual in the 2.x series, this release is made
> available only as source, but is also available within Maven
> Central<http://search.maven.org/>.
> The release is available here <http://www.apache.org/dyn/closer.cgi/nutch/
> >.
>
>
> Have a great day
>
> Best
> lewismc
> (on behalf of the Apache Nutch community)
>
> --
> *Lewis*
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: [ANNOUNCE] Apache Nutch v2.2.1 Released

Posted by Julien Nioche <li...@gmail.com>.
Great stuff! Thanks Lewis


On 2 July 2013 17:32, Lewis John Mcgibbney <le...@gmail.com>wrote:

> Good Afternoon Everyone,
>
> The Apache Nutch PMC are very pleased to announce the immediate release of
> Apache Nutch v2.2.1, we advise all current users and developers of the 2.X
> series to upgrade to this release ASAP.
>
> Apache Nutch is an open source web-search software project. Stemming
> from Apache
> Lucene <http://lucene.apache.org/java/>, it now builds on Apache
> Solr<http://lucene.apache.org/solr/>adding web-specifics, such as a
> crawler, a link-graph database and parsing
> support handled by Apache Tika <http://tika.apache.org/> for HTML and and
> array other document formats.
>
> Although this release includes library upgrades to Apache
> Hadoop<http://hadoop.apache.org>1.2.0 and Apache
> Tika <http://tika.apache.org> 1.3, it is predominantly a bug fix for
> NUTCH-1591
> - Incorrect conversion of ByteBuffer to
> String<https://issues.apache.org/jira/browse/NUTCH-1591>.
> Please see the list of
> changes<http://www.apache.org/dist/nutch/2.2.1/2.2.1-CHANGES.txt>for a
> full breakdown. As usual in the 2.x series, this release is made
> available only as source, but is also available within Maven
> Central<http://search.maven.org/>.
> The release is available here <http://www.apache.org/dyn/closer.cgi/nutch/
> >.
>
>
> Have a great day
>
> Best
> lewismc
> (on behalf of the Apache Nutch community)
>
> --
> *Lewis*
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: [ANNOUNCE] Apache Nutch v2.2.1 Released

Posted by glumet <ja...@gmail.com>.
Ok, my fault... if somebody is interested in the correct solution:

You must add correct version of gora-hbase to you libraries.
gora-hbase-0.3.jar (I used 0.2.1)



--
View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Apache-Nutch-v2-2-1-Released-tp4074798p4075575.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: [ANNOUNCE] Apache Nutch v2.2.1 Released

Posted by glumet <ja...@gmail.com>.
And this is the output from hadoop.log

/2013-07-04 16:12:05,069 WARN  mapred.LocalJobRunner -
job_local1522971864_0001
java.lang.Exception: java.lang.NoSuchMethodError:    
org.apache.gora.persistency.Persistent.getSchema()Lorg/apache/avro/Schema;
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NoSuchMethodError:    
org.apache.gora.persistency.Persistent.getSchema()Lorg/apache/avro/Schema;
at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:177)
at
org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:191)
at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:88)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)/



--
View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Apache-Nutch-v2-2-1-Released-tp4074798p4075502.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: [ANNOUNCE] Apache Nutch v2.2.1 Released

Posted by glumet <ja...@gmail.com>.
Unfortunately, I need to write that the new version is not working for me...

It fails already while injecting

InjectorJob: starting at 2013-07-04 15:15:01
InjectorJob: Injecting urlDir: /opt/ir/nutch2/urls
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora
storage class.
InjectorJob: java.lang.RuntimeException: job failed: name=[newwebpage]inject
/opt/ir/nutch2/urls, jobid=job_local2141339932_0001
	at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
	at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
	at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
	at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)




--
View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Apache-Nutch-v2-2-1-Released-tp4074798p4075488.html
Sent from the Nutch - User mailing list archive at Nabble.com.

RE: [ANNOUNCE] Apache Nutch v2.2.1 Released

Posted by Markus Jelsma <ma...@openindex.io>.
Great news, thanks Lewis!

-----Original message-----
From: Lewis John Mcgibbney<le...@gmail.com>
Sent: Tuesday 2nd July 2013 18:32
To: user@nutch.apache.org; dev@nutch.apache.org
Subject: [ANNOUNCE] Apache Nutch v2.2.1 Released

Good Afternoon Everyone,

The Apache Nutch PMC are very pleased to announce the immediate release of Apache Nutch v2.2.1, we advise all
     current users and developers of the 2.X series to upgrade to this release ASAP.

Apache Nutch is an open source web-search
      software project.  Stemming from Apache Lucene <http://lucene.apache.org/java/>, it now builds 
      on Apache Solr <http://lucene.apache.org/solr/> adding web-specifics, such as a crawler, 
      a link-graph database and parsing support handled by Apache Tika <http://tika.apache.org/>
      for HTML and and array other document formats.

Although this
     release includes library upgrades to Apache Hadoop <http://hadoop.apache.org> 1.2.0 and 
     Apache Tika <http://tika.apache.org> 1.3, it is predominantly a bug fix for 
     NUTCH-1591 - Incorrect conversion of ByteBuffer to String <https://issues.apache.org/jira/browse/NUTCH-1591>.
     Please see the list of changes <http://www.apache.org/dist/nutch/2.2.1/2.2.1-CHANGES.txt> for a full
     breakdown.
     As usual in the 2.x series, this release is made available only as source, but is also available within
     Maven Central <http://search.maven.org/>.
     The release is available here <http://www.apache.org/dyn/closer.cgi/nutch/>.

Have a great day

Best

lewismc

(on behalf of the Apache Nutch community)
-- 
Lewis



RE: [ANNOUNCE] Apache Nutch v2.2.1 Released

Posted by Markus Jelsma <ma...@openindex.io>.
Great news, thanks Lewis!

-----Original message-----
From: Lewis John Mcgibbney<le...@gmail.com>
Sent: Tuesday 2nd July 2013 18:32
To: user@nutch.apache.org; dev@nutch.apache.org
Subject: [ANNOUNCE] Apache Nutch v2.2.1 Released

Good Afternoon Everyone,

The Apache Nutch PMC are very pleased to announce the immediate release of Apache Nutch v2.2.1, we advise all
     current users and developers of the 2.X series to upgrade to this release ASAP.

Apache Nutch is an open source web-search
      software project.  Stemming from Apache Lucene <http://lucene.apache.org/java/>, it now builds 
      on Apache Solr <http://lucene.apache.org/solr/> adding web-specifics, such as a crawler, 
      a link-graph database and parsing support handled by Apache Tika <http://tika.apache.org/>
      for HTML and and array other document formats.

Although this
     release includes library upgrades to Apache Hadoop <http://hadoop.apache.org> 1.2.0 and 
     Apache Tika <http://tika.apache.org> 1.3, it is predominantly a bug fix for 
     NUTCH-1591 - Incorrect conversion of ByteBuffer to String <https://issues.apache.org/jira/browse/NUTCH-1591>.
     Please see the list of changes <http://www.apache.org/dist/nutch/2.2.1/2.2.1-CHANGES.txt> for a full
     breakdown.
     As usual in the 2.x series, this release is made available only as source, but is also available within
     Maven Central <http://search.maven.org/>.
     The release is available here <http://www.apache.org/dyn/closer.cgi/nutch/>.

Have a great day

Best

lewismc

(on behalf of the Apache Nutch community)
-- 
Lewis