You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alex McLintock <al...@gmail.com> on 2010/06/10 19:17:28 UTC

HBase and RC 1.1 and plugins

I'm not exactly new to Nutch, but haven't used it for a year or so.
I'm a bit out of touch with current "state of the art".

I see there is some HBase code in the form of some patches. I don't
know whether this is more than "proof of concept" stuff.

I also see that there is a 1.1 release candidate in the works.

however I can see no mention of HBase in the release candidate? Is it
there at all?

If I use Nutch I am going to have to develop several plugins of my own
and perhaps change the way that URLs are found for second and
subsequent crawls. I think that HBase would significantly help with
this.


References:
http://www.gossamer-threads.com/lists/lucene/general/99072 [VOTE]
Apache Nutch  1.1 Release Candidate #2
and
http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/CHANGES-1.1.txt
and
https://issues.apache.org/jira/browse/NUTCH-650

Re: HBase and RC 1.1 and plugins

Posted by Kula <ku...@gmail.com>.
i also interesting at hbase with nutch.

2010/6/11 Doğacan Güney <do...@gmail.com>

> Hi,
>
> On Thu, Jun 10, 2010 at 20:17, Alex McLintock <alex.mclintock@gmail.com
> >wrote:
>
> > I'm not exactly new to Nutch, but haven't used it for a year or so.
> > I'm a bit out of touch with current "state of the art".
> >
> > I see there is some HBase code in the form of some patches. I don't
> > know whether this is more than "proof of concept" stuff.
> >
> > I also see that there is a 1.1 release candidate in the works.
> >
> > however I can see no mention of HBase in the release candidate? Is it
> > there at all?
> >
> > If I use Nutch I am going to have to develop several plugins of my own
> > and perhaps change the way that URLs are found for second and
> > subsequent crawls. I think that HBase would significantly help with
> > this.
> >
> >
> > References:
> > http://www.gossamer-threads.com/lists/lucene/general/99072 [VOTE]
> > Apache Nutch  1.1 Release Candidate #2
> > and
> > http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/CHANGES-1.1.txt
> > and
> > https://issues.apache.org/jira/browse/NUTCH-650
> >
>
> Nutch-hbase integration is still on track but development slowed down a lot
> for a while. It
> is currently picking up speed again, and early next week, I will send an
> email explaining
> current situation and then we can discuss next steps from there. FWIW, my
> goal is to finish
> it for Nutch 2.0.
>
> --
> Doğacan Güney
>

Re: HBase and RC 1.1 and plugins

Posted by Doğacan Güney <do...@gmail.com>.
Hi,

On Thu, Jun 10, 2010 at 20:17, Alex McLintock <al...@gmail.com>wrote:

> I'm not exactly new to Nutch, but haven't used it for a year or so.
> I'm a bit out of touch with current "state of the art".
>
> I see there is some HBase code in the form of some patches. I don't
> know whether this is more than "proof of concept" stuff.
>
> I also see that there is a 1.1 release candidate in the works.
>
> however I can see no mention of HBase in the release candidate? Is it
> there at all?
>
> If I use Nutch I am going to have to develop several plugins of my own
> and perhaps change the way that URLs are found for second and
> subsequent crawls. I think that HBase would significantly help with
> this.
>
>
> References:
> http://www.gossamer-threads.com/lists/lucene/general/99072 [VOTE]
> Apache Nutch  1.1 Release Candidate #2
> and
> http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/CHANGES-1.1.txt
> and
> https://issues.apache.org/jira/browse/NUTCH-650
>

Nutch-hbase integration is still on track but development slowed down a lot
for a while. It
is currently picking up speed again, and early next week, I will send an
email explaining
current situation and then we can discuss next steps from there. FWIW, my
goal is to finish
it for Nutch 2.0.

-- 
Doğacan Güney