You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2015/08/19 00:29:26 UTC

Re: Nutch 2.3 : Backend datastorage problem

Hi Alexandre,

Apologies for the hellishly long time before I've picked up this message!
Current status of 2.X branch is that it in need of some attention and major
upgrades to Key dependencies. This is inherited through the dependency upon
Apache Gora, as we need to release Apache Gora 0.6.1 which will enable
Nutch 2.X users to progress with life, and make upgrades to many of these
critical storage and processing components.
My main plea at this stage is, if you are driven to using 2.X then please
help up, we need people to maintain the code and develop it as well.
Thanks
Lewis

On Fri, Jul 10, 2015 at 4:43 AM, <us...@nutch.apache.org> wrote:

>
> I'm using Nutch 2.3 in order to crawl some websites and index them into
> ElasticSearch 1.3.
> My problem is the database storage that I used. I have tested :
> -          HBase 0.94.14 : Works well, however, the fully distributed
> cluster use HBase 1.0.0 and I can't use it because Nutch is not compatible
> with this version.
> -          Accumulo 1.6 on cloudera : No errors with Nutch but it does
> nothing in the database ... I'm not sure it works well...
> -          Cassandra 2.0.2 : It creates well the keyspace with tables but
> the crawl doesn't success and I think it's a trouble with Nutch...
> So...  I would like to know what can I do now ?
> I have a cluster with Hadoop 2.6.0 with HBase 1.0.0 and Accumulo 1.6
> (fully distributed). However, I am not sure that Nutch 2.3 works with this
> environment but I would like some advices if you have any idea in order to
> help me ...
> Thank you for reading.
> Sincerely