You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@lucene.apache.org by ppuyen <kh...@mail.ru> on 2008/12/27 22:34:44 UTC

Help me to understanding how does DataBase work in Search engine.

Hello everyone,

I have a project Search engine. My part is :
 1. Creat dataBase for Search engine
 2. indexing and searching 
 
    The 1st my friend run Crawler to save webPages from internet then I use
these webpages to create database by Oracle (  I want to create DataBase to
optimal search).
    The 2nd  I use Hibernate to connect DataBase and Java .
    Then I indexing and searching by Lucene. 

it's only the ways I read and understand from internet  (my Enghlish is very
bad I'm afraid I mistook )  .
My friend told me need indexing all webpages then save all files were
indexed in DataBase.
then I don't know which is right.

Who can tell me the right way ? or if have another way, tell me please
.Thanks

-- 
View this message in context: http://www.nabble.com/Help-me-to-understanding-how-does-DataBase-work-in-Search-engine.-tp21187891p21187891.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: Help me to understanding how does DataBase work in Search engine.

Posted by Chris Hostetter <ho...@fucit.org>.

: I have a project Search engine. My part is :
:  1. Creat dataBase for Search engine
:  2. indexing and searching 

: My friend told me need indexing all webpages then save all files were
: indexed in DataBase.
: then I don't know which is right.

databases can build internal "indexes" on tables to make certain 
queries faster ... so if you have a database of webpages you can build an 
index on something like a "size" field to make searching for pages by size 
faster.

some databases have a feature called a "fulltext" index that can be built 
on text colums to make searching for words faster them doing simple "LIKE" 
queries.  This can work in some use cases, but these database "fulltext" 
indexes tend to be very limiting and not easy to customize.

based on what you've described, a couple of Lucene subrpojects might be 
useful to you...

http://lucene.apache.org/nutch/
Nutch is specificly designed to crawl and index webpages.

http://lucene.apache.org/solr/
Solr is a search "application" that let's you index/query content using 
any language over HTTP.  It comes with a DataImportHandler plugin that 
lets you automaticly index databases using configuration to describe how 
to fetch the logical contents of each "document"

http://lucene.apache.org/java/
Lucene-Java is the underlying search library used in both Nutch and Solr, 
if you want to custom build search based logic you can use this library.  
As you mentioned, there is also a Hibernate project for integrating with 
Lucene.

if you have followup questions about any of those 3 subprojects, please 
consult the specific user mailing list for the project that you are 
interested in.


-Hoss