You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rehan Syed <re...@yahoo.com> on 2002/09/25 08:50:55 UTC

Lucene and RDBMS.

Hi,

I am in process of implementing a Knowlegde base for internal use by my company.  
The contents of this Knowledge base will be stored in one or more database table(s).  I am evaluating Lucene for performing text searches on this Knowledge base. I understand that Lucene has two components, indexing and searching, but both these components work on files, not on text data stored in an RDBMS.  

In order for me to use Lucene, would I need to develop a process that will extract text data out of the database, create text files and then do the indexing and searching?  Are there any other approaches to this problem?  Comments/suggestions would be greatly appreciated.



---------------------------------
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!

RE: Lucene and RDBMS.

Posted by "Nader S. Henein" <ns...@bayt.net>.
The initial motivation behind switching from intermedia to Lucene was a
first step in achieving DB abstraction because if you rely on intermedia for
your indexing and searching purposes you're pretty much stuck with Oracle,
an excellent DB but if you're business is growing the licensing fees become
massive. Another thing is that I don't maintain one index on the Database
server, I maintain an index on each webserver witch allowed me to reduce the
average load on the DB machine by 78%, it's a little bit of a
synchronization might mare but we've had it in place for the past three
months without incident plus you have redundant indexes in-case one becomes
corrupted. Furthermore the traffic between the DB machine and the webserver
witch was inflated by having to pass search results back and forth has been
dwarfed.

Now the true joy behind using Lucene is the performance boost you'll get, we
had intermedia customized and tuned to our needs yet Lucene was able to give
a 200% increase in performance , a huge asset to our site witch is mainly
search driven.

PS: the reason why we create XML files and then hand them to Lucene is
because, the files are then used for display purposes and caching purposes,
because once they are transmitted to the webserver machines they save me the
hassle of retrieving them from the database since they are the most recent
version of the documents.

Nader Henein


-----Original Message-----
From: Mariusz Dziewierz [mailto:aristot@student.uci.agh.edu.pl]
Sent: Wednesday, September 25, 2002 4:23 PM
To: Lucene Users List
Subject: Re: Lucene and RDBMS.


Nader S. Henein wrote:
> We had to do the same thing, we moved from an Oracle Intermedia search to
> Lucene (much better) the data is stored in the database.

Could you give some reasons which lead you to conclusion that Lucene is
much better than Oracle Intermedia in terms of searching data stored in
database? I'm currently reviewing technologies related to text mining
and I am very curious about your motives because I haven't opportunity
to evaluate both technologies yet.

--
Mariusz Dziewierz


--
To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
For additional commands, e-mail:
<ma...@jakarta.apache.org>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Comparing Intermedia and Lucene

Posted by Joshua O'Madadhain <jm...@ics.uci.edu>.
On Wed, 25 Sep 2002, Peter Carlson wrote:

> Can you share you code? We would love to add it to the contributions 
> area.

I could, but I'd prefer to wait until I have had a chance to do some more
testing on it.  Right now it requires a fair amount of time and memory and
it's not clear that the return is worth the cost.  <wry smile> I hope to
do the necessary investigations within a few months; once I've done this
(and hopefully published a paper from it), I'll be more than happy to post
it to the contributions area.

Regards,

Joshua 

 jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
  Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Comparing Intermedia and Lucene

Posted by Peter Carlson <ca...@bookandhammer.com>.
Can you share you code? We would love to add it to the contributions 
area.

--Peter


On Wednesday, September 25, 2002, at 11:39 AM, Joshua O'Madadhain wrote:

> On Wed, 25 Sep 2002, Peter Carlson wrote:
>
>> I have used intermedia in the past and found that it had a few
>> advantages and disadvantages compared to Lucene.
>>
>> Advantages of Intermedia
>> 4) Supports term expansion and contraction
>>
>> Now items 3 and 4 of Intermedia advantages can be added as Features to
>> Lucene, but are not currently.
>
> I'm not sure what you mean by "supports term expansion".  In the IR
> community, there are many different mechanisms for term expansion; 
> perhaps
> in the database community the term is more restrictive.  My 
> understanding
> is that the only thing you need in order to be able to expand terms (or
> queries) is to be able to modify queries, which you can certainly do 
> under
> Lucene.  (A research project of mine uses Lucene as the core for a 
> search
> engine that does query expansion, among other things.)  Yes, I had to
> write some code to do this, but static term expansion, in which each 
> term
> is expanded to a fixed list of other terms (which may be
> specifiable/modifiable by administrators or users), is pretty
> straightforward to code (my project used a considerably more 
> sophisticated
> mechanism).
>
> If what you mean is that there is no specific API for term expansion in
> Lucene, that's true, but I'm not sure how much value such an API would 
> add
> to Lucene.
>
> Joshua O'Madadhain
>
>  jmadden@ics.uci.edu...Obscurium Per 
> Obscurius...www.ics.uci.edu/~jmadden
>   Joshua O'Madadhain: Information Scientist, Musician, 
> Philosopher-At-Tall
>  It's that moment of dawning comprehension that I live for--Bill 
> Watterson
> My opinions are too rational and insightful to be those of any 
> organization.
>
>
>
>
> --
> To unsubscribe, e-mail:   
> <ma...@jakarta.apache.org>
> For additional commands, e-mail: 
> <ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Comparing Intermedia and Lucene

Posted by Joshua O'Madadhain <jm...@ics.uci.edu>.
On Wed, 25 Sep 2002, Peter Carlson wrote:

> I have used intermedia in the past and found that it had a few 
> advantages and disadvantages compared to Lucene.
> 
> Advantages of Intermedia
> 4) Supports term expansion and contraction
> 
> Now items 3 and 4 of Intermedia advantages can be added as Features to 
> Lucene, but are not currently.

I'm not sure what you mean by "supports term expansion".  In the IR
community, there are many different mechanisms for term expansion; perhaps
in the database community the term is more restrictive.  My understanding
is that the only thing you need in order to be able to expand terms (or
queries) is to be able to modify queries, which you can certainly do under
Lucene.  (A research project of mine uses Lucene as the core for a search
engine that does query expansion, among other things.)  Yes, I had to
write some code to do this, but static term expansion, in which each term
is expanded to a fixed list of other terms (which may be
specifiable/modifiable by administrators or users), is pretty
straightforward to code (my project used a considerably more sophisticated
mechanism).

If what you mean is that there is no specific API for term expansion in
Lucene, that's true, but I'm not sure how much value such an API would add
to Lucene.

Joshua O'Madadhain

 jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
  Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Comparing Intermedia and Lucene

Posted by Peter Carlson <ca...@bookandhammer.com>.
I have used intermedia in the past and found that it had a few 
advantages and disadvantages compared to Lucene.


Advantages of Intermedia
1) Content in database so can be easily accessed and fielded out with 
many tools
2) Can combine text queries with SQL queries
3) Has richer support for themes
4) Supports term expansion and contraction

Now items 3 and 4 of Intermedia advantages can be added as Features to 
Lucene, but are not currently.

Advantages of Lucene
1) Search Speed (intermedia was very very slow for us when search full 
text - also I run Lucene on a single processor sun an I ran Intermedia 
on a dual processor box with a RAID)
2) Index integrity (When we indexed with Intermedia it would fail 1 out 
of every 5 or 6 times)
3) Indexing Speed (Indexing was about 2/3 slower on Intermedia)
4) File based (It is easy to distribute a Lucene index on multiple 
computers, where Intermedia you had to connect to one mondy machine)
5) Cost
6) Support (the Lucene mailling list provide much better support than 
Oracle)
7) Personal skill level (To run Intermedia I would recommend having 
someone knowledgeable as a DBA)
8) Built in QueryParser (Intermedia does not come with a way to parse 
the query string into an Intermedia Query - you have to write your own).


Some of this issues may have been specific to my situation, but that's 
my experience.

--Peter

On Wednesday, September 25, 2002, at 05:22 AM, Mariusz Dziewierz wrote:

> Nader S. Henein wrote:
>> We had to do the same thing, we moved from an Oracle Intermedia 
>> search to
>> Lucene (much better) the data is stored in the database.
>
> Could you give some reasons which lead you to conclusion that Lucene 
> is much better than Oracle Intermedia in terms of searching data 
> stored in database? I'm currently reviewing technologies related to 
> text mining and I am very curious about your motives because I haven't 
> opportunity to evaluate both technologies yet.
>
> -- 
> Mariusz Dziewierz
>
>
> --
> To unsubscribe, e-mail:   
> <ma...@jakarta.apache.org>
> For additional commands, e-mail: 
> <ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Lucene and RDBMS.

Posted by Mariusz Dziewierz <ar...@student.uci.agh.edu.pl>.
Nader S. Henein wrote:
> We had to do the same thing, we moved from an Oracle Intermedia search to
> Lucene (much better) the data is stored in the database.

Could you give some reasons which lead you to conclusion that Lucene is 
much better than Oracle Intermedia in terms of searching data stored in 
database? I'm currently reviewing technologies related to text mining 
and I am very curious about your motives because I haven't opportunity 
to evaluate both technologies yet.

-- 
Mariusz Dziewierz


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Lucene and RDBMS.

Posted by "Nader S. Henein" <ns...@bayt.net>.
We had to do the same thing, we moved from an Oracle Intermedia search to
Lucene (much better) the data is stored in the database.
What we did is produce XML files on an interval (15 minutes) and those files
would be picked by the indexer witch would delete any previous occurrence of
the same entry and re-index the new one and then optimize the index. You
could do the whole process in one shot retrieve a stream from the DB and
then pass it directly to Lucene, but the stream should be in field,value
pairs ( so XML makes sense ).

The answer to your question is no you don't have to use files to create the
index. The index itself is file based though.

Nader Henein

-----Original Message-----
From: Rehan Syed [mailto:rehan_n_syed@yahoo.com]
Sent: Wednesday, September 25, 2002 10:51 AM
To: lucene-user@jakarta.apache.org
Subject: Lucene and RDBMS.



Hi,

I am in process of implementing a Knowlegde base for internal use by my
company.
The contents of this Knowledge base will be stored in one or more database
table(s).  I am evaluating Lucene for performing text searches on this
Knowledge base. I understand that Lucene has two components, indexing and
searching, but both these components work on files, not on text data stored
in an RDBMS.

In order for me to use Lucene, would I need to develop a process that will
extract text data out of the database, create text files and then do the
indexing and searching?  Are there any other approaches to this problem?
Comments/suggestions would be greatly appreciated.



---------------------------------
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>