You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@lucene.apache.org by Yingfeng Zhang <yi...@gmail.com> on 2008/04/09 11:34:19 UTC

Lucene on Hadoop

Hi,
>From this URL
http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00998.html
I see that Hadoop is not suitable for incremental updates if the inverted
files is based on it, and what's more, Nutch has adopted Hadoop, that means
the incremental updates ability provided by Lucene will not work in Nutch.

Also, It is suggested to experiment with HBase, which is a BigTable based on
GFS. Since HBase is based Hadoop, then what is a difference if using HBase
for incremental indexing?  Thanks a lot for attentions.


Best
Yingfeng

Re: Lucene on Hadoop

Posted by Chris Hostetter <ho...@fucit.org>.

: >From this URL
: http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00998.html
: I see that Hadoop is not suitable for incremental updates if the inverted
: files is based on it, and what's more, Nutch has adopted Hadoop, that means
: the incremental updates ability provided by Lucene will not work in Nutch.

Questions about Nutch's use of Lucene and HDFS are best addressed on a 
nutch specific list.

: Also, It is suggested to experiment with HBase, which is a BigTable based on
: GFS. Since HBase is based Hadoop, then what is a difference if using HBase
: for incremental indexing?  Thanks a lot for attentions.

While i'm not an expert on HBase from the single message you linked to i 
see the comment: "...HBase is designed to be a much more scalable, 
incrementally updateable DB than BDB or relational DBs..." which suggests 
to me that while HBase may be built on HDFS, the API abstraction may allow 
for "Files" which do allow for incremental updates.


-Hoss