You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Duan, Nick" <ND...@mcdonaldbradley.com> on 2008/03/04 19:33:19 UTC

Why indexing database is necessary? (RE: indexing database)

Could anyone provide any insight on why someone would use nutch/lucene
or any other search engines to index relational databases? With use
cases if possible?  Shouldn't the database's own indexing mechanism be
used since it is more efficient?

If there is such a need of indexing the database content using search
engines, what would be the best approach other than de-normalizing the
database?  

Thanks a lot in advance!

ND
-----Original Message-----
From: payo [mailto:payo22@yahoo.com] 
Sent: Tuesday, March 04, 2008 12:36 PM
To: nutch-user@lucene.apache.org
Subject: indexing database


hi to all

i can index a database with nutch?

i am use nutch 0.8.1

thanks
-- 
View this message in context:
http://www.nabble.com/indexing-database-tp15832696p15832696.html
Sent from the Nutch - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Why indexing database is necessary? (RE: indexing database)

Posted by Will Johnson <wi...@gmail.com>.
Not necessarily, many of the high traffic search sites on the market today
for everything from yellow pages to job boards to ecommerce sites use search
engines to exclusively search *and* retrieve/serve content.  The key is that
they don't have to return all matching rows only the 'best' which are
probably the ones you would want anyways.

- will

-----Original Message-----
From: Duan, Nick [mailto:NDuan@mcdonaldbradley.com] 
Sent: Tuesday, March 04, 2008 2:29 PM
To: java-user@lucene.apache.org
Subject: RE: Why indexing database is necessary? (RE: indexing database)

Hmm, I guess that's because a database query returns a list of records,
whereas search engine returns only the links, not the actual content.
So a search engine works only in the index space, whereas a database
query engine would have to work in both index and content space...


ND

-----Original Message-----
From: Will Johnson [mailto:willjohnsonsearch@gmail.com] 
Sent: Tuesday, March 04, 2008 2:18 PM
To: java-user@lucene.apache.org
Subject: RE: Why indexing database is necessary? (RE: indexing database)

Don't forget the number 1 reason: speed.  For certain types of queries a
search engine can return results orders of magnitude faster than a
database.
I've seen search engines return hits in hundreds of milliseconds when
the
same database query took hours or even days.  That's not to say that a
search engine is always better, just the it often times is for when the
inputs and outputs are carefully defined.

- will

-----Original Message-----
From: Darren Hartford [mailto:dhartford@ghsinc.com] 
Sent: Tuesday, March 04, 2008 1:52 PM
To: java-user@lucene.apache.org
Subject: RE: Why indexing database is necessary? (RE: indexing database)

Indexing with lucene/nutch on top of/instead of DB indexing for:

1) relativity scoring
2) alias searching (i.e. a large amount of aliases, like first names)
3) highlighting
4) cross-datasource searching (multi DB, DB + XML files, etc).

As for best approach to externally index, I do not have any direct
pointers.  I would recommend looking at an ETL tool that can be extended
for this purpose (I've started writing a plugin for Pentaho, but got
pulled off and haven't finished it -- and that was for Solr, not
lucene/nutch).

-D

> -----Original Message-----
> From: Duan, Nick [mailto:NDuan@mcdonaldbradley.com]
> Sent: Tuesday, March 04, 2008 1:33 PM
> To: java-user@lucene.apache.org
> Subject: Why indexing database is necessary? (RE: indexing database)
> 
> Could anyone provide any insight on why someone would use nutch/lucene
> or any other search engines to index relational databases? With use
> cases if possible?  Shouldn't the database's own indexing mechanism be
> used since it is more efficient?
> 
> If there is such a need of indexing the database content using search
> engines, what would be the best approach other than de-normalizing the
> database?
> 
> Thanks a lot in advance!
> 
> ND
> -----Original Message-----
> From: payo [mailto:payo22@yahoo.com]
> Sent: Tuesday, March 04, 2008 12:36 PM
> To: nutch-user@lucene.apache.org
> Subject: indexing database
> 
> 
> hi to all
> 
> i can index a database with nutch?
> 
> i am use nutch 0.8.1
> 
> thanks
> --
> View this message in context:
> http://www.nabble.com/indexing-database-tp15832696p15832696.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Why indexing database is necessary? (RE: indexing database)

Posted by Erick Erickson <er...@gmail.com>.
And one other point. You probably *don't* need a search engine for your
database *if* you don't have much textual data. That is, if your database
consists of "classical" tables with columns like "firstname", "lastname",
etc.

But if your database has columns in it containing, say, a page of text then
searching that text is a real pain. *That's* where a search engine shines.

Searching a large DB text field for a single word becomes...er...awkward.

That said, there's a long thread on the Lucene thread that I didn't
understand
at all concerning embedding Lucene in Oracle. You might try looking at
the searchable Lucene threads for that...

Best
Erick

On Tue, Mar 4, 2008 at 5:27 PM, Chris Lu <ch...@gmail.com> wrote:

> Hi, Nick,
>
> Lucene Index in a sense is more like another kind of database indexes,
> because it's inverted, etc.
>
> If we ask why we need many database indexes, the answer is, different
> query execution path.
> Same thing for Lucene index, which is faster for term matching.
>
> Lucene index actually can do more. For example, facet-search, which
> tells you how many matches in each category(facet), in addition to the
> matched results. This way is more convenient for websites to display
> results, and provide additional links for users to narrow down the
> results.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request)
> got 2.6 Million Euro funding!
>
>
> On Tue, Mar 4, 2008 at 11:28 AM, Duan, Nick <ND...@mcdonaldbradley.com>
> wrote:
> > Hmm, I guess that's because a database query returns a list of records,
> >  whereas search engine returns only the links, not the actual content.
> >  So a search engine works only in the index space, whereas a database
> >  query engine would have to work in both index and content space...
> >
> >
> >  ND
> >
> >
> >
> >  -----Original Message-----
> >  From: Will Johnson [mailto:willjohnsonsearch@gmail.com]
> >  Sent: Tuesday, March 04, 2008 2:18 PM
> >  To: java-user@lucene.apache.org
> >  Subject: RE: Why indexing database is necessary? (RE: indexing
> database)
> >
> >  Don't forget the number 1 reason: speed.  For certain types of queries
> a
> >  search engine can return results orders of magnitude faster than a
> >  database.
> >  I've seen search engines return hits in hundreds of milliseconds when
> >  the
> >  same database query took hours or even days.  That's not to say that a
> >  search engine is always better, just the it often times is for when the
> >  inputs and outputs are carefully defined.
> >
> >  - will
> >
> >  -----Original Message-----
> >  From: Darren Hartford [mailto:dhartford@ghsinc.com]
> >  Sent: Tuesday, March 04, 2008 1:52 PM
> >  To: java-user@lucene.apache.org
> >  Subject: RE: Why indexing database is necessary? (RE: indexing
> database)
> >
> >  Indexing with lucene/nutch on top of/instead of DB indexing for:
> >
> >  1) relativity scoring
> >  2) alias searching (i.e. a large amount of aliases, like first names)
> >  3) highlighting
> >  4) cross-datasource searching (multi DB, DB + XML files, etc).
> >
> >  As for best approach to externally index, I do not have any direct
> >  pointers.  I would recommend looking at an ETL tool that can be
> extended
> >  for this purpose (I've started writing a plugin for Pentaho, but got
> >  pulled off and haven't finished it -- and that was for Solr, not
> >  lucene/nutch).
> >
> >  -D
> >
> >  > -----Original Message-----
> >  > From: Duan, Nick [mailto:NDuan@mcdonaldbradley.com]
> >  > Sent: Tuesday, March 04, 2008 1:33 PM
> >  > To: java-user@lucene.apache.org
> >  > Subject: Why indexing database is necessary? (RE: indexing database)
> >  >
> >  > Could anyone provide any insight on why someone would use
> nutch/lucene
> >  > or any other search engines to index relational databases? With use
> >  > cases if possible?  Shouldn't the database's own indexing mechanism
> be
> >  > used since it is more efficient?
> >  >
> >  > If there is such a need of indexing the database content using search
> >  > engines, what would be the best approach other than de-normalizing
> the
> >  > database?
> >  >
> >  > Thanks a lot in advance!
> >  >
> >  > ND
> >  > -----Original Message-----
> >  > From: payo [mailto:payo22@yahoo.com]
> >  > Sent: Tuesday, March 04, 2008 12:36 PM
> >  > To: nutch-user@lucene.apache.org
> >  > Subject: indexing database
> >  >
> >  >
> >  > hi to all
> >  >
> >  > i can index a database with nutch?
> >  >
> >  > i am use nutch 0.8.1
> >  >
> >  > thanks
> >  > --
> >  > View this message in context:
> >  > http://www.nabble.com/indexing-database-tp15832696p15832696.html
> >  > Sent from the Nutch - User mailing list archive at Nabble.com.
> >  >
> >  >
> >  > ---------------------------------------------------------------------
> >  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Why indexing database is necessary? (RE: indexing database)

Posted by Chris Lu <ch...@gmail.com>.
Hi, Nick,

Lucene Index in a sense is more like another kind of database indexes,
because it's inverted, etc.

If we ask why we need many database indexes, the answer is, different
query execution path.
Same thing for Lucene index, which is faster for term matching.

Lucene index actually can do more. For example, facet-search, which
tells you how many matches in each category(facet), in addition to the
matched results. This way is more convenient for websites to display
results, and provide additional links for users to narrow down the
results.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request)
got 2.6 Million Euro funding!


On Tue, Mar 4, 2008 at 11:28 AM, Duan, Nick <ND...@mcdonaldbradley.com> wrote:
> Hmm, I guess that's because a database query returns a list of records,
>  whereas search engine returns only the links, not the actual content.
>  So a search engine works only in the index space, whereas a database
>  query engine would have to work in both index and content space...
>
>
>  ND
>
>
>
>  -----Original Message-----
>  From: Will Johnson [mailto:willjohnsonsearch@gmail.com]
>  Sent: Tuesday, March 04, 2008 2:18 PM
>  To: java-user@lucene.apache.org
>  Subject: RE: Why indexing database is necessary? (RE: indexing database)
>
>  Don't forget the number 1 reason: speed.  For certain types of queries a
>  search engine can return results orders of magnitude faster than a
>  database.
>  I've seen search engines return hits in hundreds of milliseconds when
>  the
>  same database query took hours or even days.  That's not to say that a
>  search engine is always better, just the it often times is for when the
>  inputs and outputs are carefully defined.
>
>  - will
>
>  -----Original Message-----
>  From: Darren Hartford [mailto:dhartford@ghsinc.com]
>  Sent: Tuesday, March 04, 2008 1:52 PM
>  To: java-user@lucene.apache.org
>  Subject: RE: Why indexing database is necessary? (RE: indexing database)
>
>  Indexing with lucene/nutch on top of/instead of DB indexing for:
>
>  1) relativity scoring
>  2) alias searching (i.e. a large amount of aliases, like first names)
>  3) highlighting
>  4) cross-datasource searching (multi DB, DB + XML files, etc).
>
>  As for best approach to externally index, I do not have any direct
>  pointers.  I would recommend looking at an ETL tool that can be extended
>  for this purpose (I've started writing a plugin for Pentaho, but got
>  pulled off and haven't finished it -- and that was for Solr, not
>  lucene/nutch).
>
>  -D
>
>  > -----Original Message-----
>  > From: Duan, Nick [mailto:NDuan@mcdonaldbradley.com]
>  > Sent: Tuesday, March 04, 2008 1:33 PM
>  > To: java-user@lucene.apache.org
>  > Subject: Why indexing database is necessary? (RE: indexing database)
>  >
>  > Could anyone provide any insight on why someone would use nutch/lucene
>  > or any other search engines to index relational databases? With use
>  > cases if possible?  Shouldn't the database's own indexing mechanism be
>  > used since it is more efficient?
>  >
>  > If there is such a need of indexing the database content using search
>  > engines, what would be the best approach other than de-normalizing the
>  > database?
>  >
>  > Thanks a lot in advance!
>  >
>  > ND
>  > -----Original Message-----
>  > From: payo [mailto:payo22@yahoo.com]
>  > Sent: Tuesday, March 04, 2008 12:36 PM
>  > To: nutch-user@lucene.apache.org
>  > Subject: indexing database
>  >
>  >
>  > hi to all
>  >
>  > i can index a database with nutch?
>  >
>  > i am use nutch 0.8.1
>  >
>  > thanks
>  > --
>  > View this message in context:
>  > http://www.nabble.com/indexing-database-tp15832696p15832696.html
>  > Sent from the Nutch - User mailing list archive at Nabble.com.
>  >
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Why indexing database is necessary? (RE: indexing database)

Posted by "Duan, Nick" <ND...@mcdonaldbradley.com>.
Hmm, I guess that's because a database query returns a list of records,
whereas search engine returns only the links, not the actual content.
So a search engine works only in the index space, whereas a database
query engine would have to work in both index and content space...


ND

-----Original Message-----
From: Will Johnson [mailto:willjohnsonsearch@gmail.com] 
Sent: Tuesday, March 04, 2008 2:18 PM
To: java-user@lucene.apache.org
Subject: RE: Why indexing database is necessary? (RE: indexing database)

Don't forget the number 1 reason: speed.  For certain types of queries a
search engine can return results orders of magnitude faster than a
database.
I've seen search engines return hits in hundreds of milliseconds when
the
same database query took hours or even days.  That's not to say that a
search engine is always better, just the it often times is for when the
inputs and outputs are carefully defined.

- will

-----Original Message-----
From: Darren Hartford [mailto:dhartford@ghsinc.com] 
Sent: Tuesday, March 04, 2008 1:52 PM
To: java-user@lucene.apache.org
Subject: RE: Why indexing database is necessary? (RE: indexing database)

Indexing with lucene/nutch on top of/instead of DB indexing for:

1) relativity scoring
2) alias searching (i.e. a large amount of aliases, like first names)
3) highlighting
4) cross-datasource searching (multi DB, DB + XML files, etc).

As for best approach to externally index, I do not have any direct
pointers.  I would recommend looking at an ETL tool that can be extended
for this purpose (I've started writing a plugin for Pentaho, but got
pulled off and haven't finished it -- and that was for Solr, not
lucene/nutch).

-D

> -----Original Message-----
> From: Duan, Nick [mailto:NDuan@mcdonaldbradley.com]
> Sent: Tuesday, March 04, 2008 1:33 PM
> To: java-user@lucene.apache.org
> Subject: Why indexing database is necessary? (RE: indexing database)
> 
> Could anyone provide any insight on why someone would use nutch/lucene
> or any other search engines to index relational databases? With use
> cases if possible?  Shouldn't the database's own indexing mechanism be
> used since it is more efficient?
> 
> If there is such a need of indexing the database content using search
> engines, what would be the best approach other than de-normalizing the
> database?
> 
> Thanks a lot in advance!
> 
> ND
> -----Original Message-----
> From: payo [mailto:payo22@yahoo.com]
> Sent: Tuesday, March 04, 2008 12:36 PM
> To: nutch-user@lucene.apache.org
> Subject: indexing database
> 
> 
> hi to all
> 
> i can index a database with nutch?
> 
> i am use nutch 0.8.1
> 
> thanks
> --
> View this message in context:
> http://www.nabble.com/indexing-database-tp15832696p15832696.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Why indexing database is necessary? (RE: indexing database)

Posted by Will Johnson <wi...@gmail.com>.
Don't forget the number 1 reason: speed.  For certain types of queries a
search engine can return results orders of magnitude faster than a database.
I've seen search engines return hits in hundreds of milliseconds when the
same database query took hours or even days.  That's not to say that a
search engine is always better, just the it often times is for when the
inputs and outputs are carefully defined.

- will

-----Original Message-----
From: Darren Hartford [mailto:dhartford@ghsinc.com] 
Sent: Tuesday, March 04, 2008 1:52 PM
To: java-user@lucene.apache.org
Subject: RE: Why indexing database is necessary? (RE: indexing database)

Indexing with lucene/nutch on top of/instead of DB indexing for:

1) relativity scoring
2) alias searching (i.e. a large amount of aliases, like first names)
3) highlighting
4) cross-datasource searching (multi DB, DB + XML files, etc).

As for best approach to externally index, I do not have any direct
pointers.  I would recommend looking at an ETL tool that can be extended
for this purpose (I've started writing a plugin for Pentaho, but got
pulled off and haven't finished it -- and that was for Solr, not
lucene/nutch).

-D

> -----Original Message-----
> From: Duan, Nick [mailto:NDuan@mcdonaldbradley.com]
> Sent: Tuesday, March 04, 2008 1:33 PM
> To: java-user@lucene.apache.org
> Subject: Why indexing database is necessary? (RE: indexing database)
> 
> Could anyone provide any insight on why someone would use nutch/lucene
> or any other search engines to index relational databases? With use
> cases if possible?  Shouldn't the database's own indexing mechanism be
> used since it is more efficient?
> 
> If there is such a need of indexing the database content using search
> engines, what would be the best approach other than de-normalizing the
> database?
> 
> Thanks a lot in advance!
> 
> ND
> -----Original Message-----
> From: payo [mailto:payo22@yahoo.com]
> Sent: Tuesday, March 04, 2008 12:36 PM
> To: nutch-user@lucene.apache.org
> Subject: indexing database
> 
> 
> hi to all
> 
> i can index a database with nutch?
> 
> i am use nutch 0.8.1
> 
> thanks
> --
> View this message in context:
> http://www.nabble.com/indexing-database-tp15832696p15832696.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Why indexing database is necessary? (RE: indexing database)

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Hi,

We have built a data import tool which can read from Databases and add
them to Solr. We found that making content available for full text
search and faceted search was a common use case and usually everyone
ends up writing a custom ETL based tool for this task. Therefore we're
contributing this back to the Solr project.

Please look at https://issues.apache.org/jira/browse/SOLR-469 for
details. A user guide is provided at
http://wiki.apache.org/solr/DataImportHandler

I realize that having such a tool for lucene would also be helpful for
a large audience. However, currently we're more focused on Solr since
we don't use lucene directly in our own production environments.

On Wed, Mar 5, 2008 at 12:22 AM, Darren Hartford <dh...@ghsinc.com> wrote:
> Indexing with lucene/nutch on top of/instead of DB indexing for:
>
>  1) relativity scoring
>  2) alias searching (i.e. a large amount of aliases, like first names)
>  3) highlighting
>  4) cross-datasource searching (multi DB, DB + XML files, etc).
>
>  As for best approach to externally index, I do not have any direct
>  pointers.  I would recommend looking at an ETL tool that can be extended
>  for this purpose (I've started writing a plugin for Pentaho, but got
>  pulled off and haven't finished it -- and that was for Solr, not
>  lucene/nutch).
>
>  -D
>
>
>
>  > -----Original Message-----
>  > From: Duan, Nick [mailto:NDuan@mcdonaldbradley.com]
>  > Sent: Tuesday, March 04, 2008 1:33 PM
>  > To: java-user@lucene.apache.org
>  > Subject: Why indexing database is necessary? (RE: indexing database)
>  >
>  > Could anyone provide any insight on why someone would use nutch/lucene
>  > or any other search engines to index relational databases? With use
>  > cases if possible?  Shouldn't the database's own indexing mechanism be
>  > used since it is more efficient?
>  >
>  > If there is such a need of indexing the database content using search
>  > engines, what would be the best approach other than de-normalizing the
>  > database?
>  >
>  > Thanks a lot in advance!
>  >
>  > ND
>  > -----Original Message-----
>  > From: payo [mailto:payo22@yahoo.com]
>  > Sent: Tuesday, March 04, 2008 12:36 PM
>  > To: nutch-user@lucene.apache.org
>  > Subject: indexing database
>  >
>  >
>  > hi to all
>  >
>  > i can index a database with nutch?
>  >
>  > i am use nutch 0.8.1
>  >
>  > thanks
>  > --
>  > View this message in context:
>  > http://www.nabble.com/indexing-database-tp15832696p15832696.html
>  > Sent from the Nutch - User mailing list archive at Nabble.com.
>  >
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Regards,
Shalin Shekhar Mangar.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Why indexing database is necessary? (RE: indexing database)

Posted by Darren Hartford <dh...@ghsinc.com>.
Indexing with lucene/nutch on top of/instead of DB indexing for:

1) relativity scoring
2) alias searching (i.e. a large amount of aliases, like first names)
3) highlighting
4) cross-datasource searching (multi DB, DB + XML files, etc).

As for best approach to externally index, I do not have any direct
pointers.  I would recommend looking at an ETL tool that can be extended
for this purpose (I've started writing a plugin for Pentaho, but got
pulled off and haven't finished it -- and that was for Solr, not
lucene/nutch).

-D

> -----Original Message-----
> From: Duan, Nick [mailto:NDuan@mcdonaldbradley.com]
> Sent: Tuesday, March 04, 2008 1:33 PM
> To: java-user@lucene.apache.org
> Subject: Why indexing database is necessary? (RE: indexing database)
> 
> Could anyone provide any insight on why someone would use nutch/lucene
> or any other search engines to index relational databases? With use
> cases if possible?  Shouldn't the database's own indexing mechanism be
> used since it is more efficient?
> 
> If there is such a need of indexing the database content using search
> engines, what would be the best approach other than de-normalizing the
> database?
> 
> Thanks a lot in advance!
> 
> ND
> -----Original Message-----
> From: payo [mailto:payo22@yahoo.com]
> Sent: Tuesday, March 04, 2008 12:36 PM
> To: nutch-user@lucene.apache.org
> Subject: indexing database
> 
> 
> hi to all
> 
> i can index a database with nutch?
> 
> i am use nutch 0.8.1
> 
> thanks
> --
> View this message in context:
> http://www.nabble.com/indexing-database-tp15832696p15832696.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org