You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Hua Su <hu...@gmail.com> on 2010/02/08 10:32:13 UTC

About HBase Integration

Hi all,

Any recent progress on HBase integration? There is a filed issue
NUTCH-650<http://issues.apache.org/jira/browse/NUTCH-650>
.

I really love the idea of using HBase as nutch storage backend. It not only
simplifies nutch storage, but also makes much url/page processing work more
efficient due to the features of HBase: HBase data is mutable and indexed
 by keys/columns/timestamps.

The issue has been open for a long time (about 17 months). Is there any plan
to close this issue and release a nutch version with hbase enabled?

Best,
Hua

Re: About HBase Integration

Posted by xiao yang <ya...@gmail.com>.
Hi, Dogacan,

I'm quite confused with the avro design nutchbase is using. The hbase
schema is defined both in  /org/apache/nutch/storage/NutchFields.java
(http://github.com/dogacan/nutchbase/blob/master/src/java/org/apache/nutch/storage/NutchFields.java)
and  /webtable.json
(http://github.com/dogacan/nutchbase/blob/master/webtable.json), then
why use webtable.json. What's the benifit?

Thanks!
Xiao

On Tue, Feb 9, 2010 at 6:12 PM, Hua Su <hu...@gmail.com> wrote:
> Hi,
>
> I notice the repository has not been updated since last Christmas. Is that
> work still in progress?
>
> Best,
> Hua
>
> On Tue, Feb 9, 2010 at 4:23 PM, Andrzej Bialecki <ab...@getopt.org> wrote:
>
>> On 2010-02-09 03:08, Hua Su wrote:
>>
>>> Thanks. But heritrix is another project, right?
>>>
>>
>>
>> Please see this Git repository, it contains the latest work in progress on
>> Nutch+HBase:
>>
>> git://github.com/dogacan/nutchbase.git
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
>

Re: About HBase Integration

Posted by Hua Su <hu...@gmail.com>.
Hi,

I notice the repository has not been updated since last Christmas. Is that
work still in progress?

Best,
Hua

On Tue, Feb 9, 2010 at 4:23 PM, Andrzej Bialecki <ab...@getopt.org> wrote:

> On 2010-02-09 03:08, Hua Su wrote:
>
>> Thanks. But heritrix is another project, right?
>>
>
>
> Please see this Git repository, it contains the latest work in progress on
> Nutch+HBase:
>
> git://github.com/dogacan/nutchbase.git
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: About HBase Integration

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 2010-02-09 03:08, Hua Su wrote:
> Thanks. But heritrix is another project, right?


Please see this Git repository, it contains the latest work in progress 
on Nutch+HBase:

git://github.com/dogacan/nutchbase.git

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: About HBase Integration

Posted by Hua Su <hu...@gmail.com>.
Thanks. But heritrix is another project, right?

Is there any plan about nutch hbase?

On Mon, Feb 8, 2010 at 5:45 PM, Ryan Smith <ry...@gmail.com>wrote:

> FWIW, there is a plugin for heritrix to write to hbase as a back end store.
> Maybe it will help for making a nutch plugin?
>
> http://code.google.com/p/hbase-writer
>
> -Ryan
>
> On Mon, Feb 8, 2010 at 4:32 AM, Hua Su <hu...@gmail.com> wrote:
>
> > Hi all,
> >
> > Any recent progress on HBase integration? There is a filed issue
> > NUTCH-650<http://issues.apache.org/jira/browse/NUTCH-650>
> > .
> >
> > I really love the idea of using HBase as nutch storage backend. It not
> only
> > simplifies nutch storage, but also makes much url/page processing work
> more
> > efficient due to the features of HBase: HBase data is mutable and indexed
> >  by keys/columns/timestamps.
> >
> > The issue has been open for a long time (about 17 months). Is there any
> > plan
> > to close this issue and release a nutch version with hbase enabled?
> >
> > Best,
> > Hua
> >
>

Re: About HBase Integration

Posted by Ryan Smith <ry...@gmail.com>.
FWIW, there is a plugin for heritrix to write to hbase as a back end store.
Maybe it will help for making a nutch plugin?

http://code.google.com/p/hbase-writer

-Ryan

On Mon, Feb 8, 2010 at 4:32 AM, Hua Su <hu...@gmail.com> wrote:

> Hi all,
>
> Any recent progress on HBase integration? There is a filed issue
> NUTCH-650<http://issues.apache.org/jira/browse/NUTCH-650>
> .
>
> I really love the idea of using HBase as nutch storage backend. It not only
> simplifies nutch storage, but also makes much url/page processing work more
> efficient due to the features of HBase: HBase data is mutable and indexed
>  by keys/columns/timestamps.
>
> The issue has been open for a long time (about 17 months). Is there any
> plan
> to close this issue and release a nutch version with hbase enabled?
>
> Best,
> Hua
>