You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/01/04 01:07:29 UTC

Re: Help me to understanding how does DataBase work in Search engine.

: I have a project Search engine. My part is :
:  1. Creat dataBase for Search engine
:  2. indexing and searching 

: My friend told me need indexing all webpages then save all files were
: indexed in DataBase.
: then I don't know which is right.

databases can build internal "indexes" on tables to make certain 
queries faster ... so if you have a database of webpages you can build an 
index on something like a "size" field to make searching for pages by size 
faster.

some databases have a feature called a "fulltext" index that can be built 
on text colums to make searching for words faster them doing simple "LIKE" 
queries.  This can work in some use cases, but these database "fulltext" 
indexes tend to be very limiting and not easy to customize.

based on what you've described, a couple of Lucene subrpojects might be 
useful to you...

http://lucene.apache.org/nutch/
Nutch is specificly designed to crawl and index webpages.

http://lucene.apache.org/solr/
Solr is a search "application" that let's you index/query content using 
any language over HTTP.  It comes with a DataImportHandler plugin that 
lets you automaticly index databases using configuration to describe how 
to fetch the logical contents of each "document"

http://lucene.apache.org/java/
Lucene-Java is the underlying search library used in both Nutch and Solr, 
if you want to custom build search based logic you can use this library.  
As you mentioned, there is also a Hibernate project for integrating with 
Lucene.

if you have followup questions about any of those 3 subprojects, please 
consult the specific user mailing list for the project that you are 
interested in.


-Hoss