You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2010/10/12 02:32:36 UTC

LuceneRevolution - NoSQL: A comparison

I listened with great interest to Grant's presentation of the NoSQL
comparisons/alternatives to Solr/Lucene. It sounds like the jury is still
out on much of this. Here's a use case that might favor using a NoSQL
alternative for storing 'stored fields' outside of Lucene.

When Solr does a distributed search across shards, it does this in 2 phases
(correct me if I'm wrong):

1. 1st query to get the docIds and facet counts
2. 2nd query to retrieve the stored fields of the top hits

The problem here is that the index could change between (1) and (2), so it's
not an atomic transaction. If the stored fields were kept outside of Lucene,
only the first query would be necessary. However, this would mean that the
external NoSQL data store would have to be synchronized with the Lucene
index, which might present its own problems. (I'm just throwing this out for
discussion)

Peter

Re: LuceneRevolution - NoSQL: A comparison

Posted by Péter Király <ki...@gmail.com>.

2010/10/12 Peter Keegan <pe...@gmail.com>:
> I listened with great interest to Grant's presentation of the NoSQL
> comparisons/alternatives to Solr/Lucene.

My question: will this presentation be available somewhere? I do not
find any presentation material nn the conference web site.

Király Péter
http://eXtensibleCatalog.org

Re: LuceneRevolution - NoSQL: A comparison

Posted by Shawn Heisey <so...@elyograg.org>.

  On 10/13/2010 6:46 AM, Yonik Seeley wrote:
>
> A related point - the load balancing implementation that's part of
> SolrCloud (and looks like it will be committed to trunk soon), does
> keep track of what server it used for the first phase and uses that
> for subsequent phases.

Are the cloud bits likely to be merged into branch_3x as well?

Re: LuceneRevolution - NoSQL: A comparison

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Ahh, LOL! I wouldn't have thought about that unless I were fixing the issues that you guys have worked on.


Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Wed, 10/13/10, Jan Høydahl / Cominvent <ja...@cominvent.com> wrote:

> From: Jan Høydahl / Cominvent <ja...@cominvent.com>
> Subject: Re: LuceneRevolution - NoSQL: A comparison
> To: solr-user@lucene.apache.org
> Date: Wednesday, October 13, 2010, 3:32 PM
> You don't know what documents to
> bring up summaries for before you have merged and sorted the
> docIds from all shards. And you don't want to waste
> resources by fetching it all. Example:
> Phase 1 request: q=foo bar&rows=10&sort=price
> asc&shards=node1:8983,node2:8983,node3:8983
> Phase 1 response: The doc-ids of the 10 lowest prices
> products from each of the three shards (total 30)
> Phase 2 request: Request the summaries based on IDs. This
> may be 3+3+4, 10+0+0, 1+1+8 or any other distribution
> Phase 2 response: The actual summaries from each shard
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 13. okt. 2010, at 19.04, Dennis Gearon wrote:
> 
> > I think that's good thinking. I wonder, do the two
> phases have to be invoked externally by two queries, or why
> couldn't it be all self contained in each instance behind
> the load leveler?
> > 
> > Just curious how it works.
> > 
> > Dennis Gearon
> > 
> > Signature Warning
> > ----------------
> > It is always a good idea to learn from your own
> mistakes. It is usually a better idea to learn from
> others’ mistakes, so you do not have to make them
> yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> > 
> > EARTH has a Right To Life,
> >  otherwise we all die.
> > 
> > 
> > --- On Wed, 10/13/10, Yonik Seeley <yo...@lucidimagination.com>
> wrote:
> > 
> >> From: Yonik Seeley <yo...@lucidimagination.com>
> >> Subject: Re: LuceneRevolution - NoSQL: A
> comparison
> >> To: solr-user@lucene.apache.org
> >> Date: Wednesday, October 13, 2010, 5:46 AM
> >> On Tue, Oct 12, 2010 at 12:11 PM, Jan
> >> Høydahl / Cominvent
> >> <ja...@cominvent.com>
> >> wrote:
> >>> I'm pretty sure the 2nd phase to fetch
> doc-summaries
> >> goes directly to same server as first phase. But
> what if you
> >> stick a LB in between?
> >> 
> >> A related point - the load balancing
> implementation that's
> >> part of
> >> SolrCloud (and looks like it will be committed to
> trunk
> >> soon), does
> >> keep track of what server it used for the first
> phase and
> >> uses that
> >> for subsequent phases.
> >> 
> >> -Yonik
> >> http://www.lucidimagination.com
> >> 
> 
>

Re: LuceneRevolution - NoSQL: A comparison

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.

You don't know what documents to bring up summaries for before you have merged and sorted the docIds from all shards. And you don't want to waste resources by fetching it all. Example:
Phase 1 request: q=foo bar&rows=10&sort=price asc&shards=node1:8983,node2:8983,node3:8983
Phase 1 response: The doc-ids of the 10 lowest prices products from each of the three shards (total 30)
Phase 2 request: Request the summaries based on IDs. This may be 3+3+4, 10+0+0, 1+1+8 or any other distribution
Phase 2 response: The actual summaries from each shard

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 13. okt. 2010, at 19.04, Dennis Gearon wrote:

> I think that's good thinking. I wonder, do the two phases have to be invoked externally by two queries, or why couldn't it be all self contained in each instance behind the load leveler?
> 
> Just curious how it works.
> 
> Dennis Gearon
> 
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH has a Right To Life,
>  otherwise we all die.
> 
> 
> --- On Wed, 10/13/10, Yonik Seeley <yo...@lucidimagination.com> wrote:
> 
>> From: Yonik Seeley <yo...@lucidimagination.com>
>> Subject: Re: LuceneRevolution - NoSQL: A comparison
>> To: solr-user@lucene.apache.org
>> Date: Wednesday, October 13, 2010, 5:46 AM
>> On Tue, Oct 12, 2010 at 12:11 PM, Jan
>> Høydahl / Cominvent
>> <ja...@cominvent.com>
>> wrote:
>>> I'm pretty sure the 2nd phase to fetch doc-summaries
>> goes directly to same server as first phase. But what if you
>> stick a LB in between?
>> 
>> A related point - the load balancing implementation that's
>> part of
>> SolrCloud (and looks like it will be committed to trunk
>> soon), does
>> keep track of what server it used for the first phase and
>> uses that
>> for subsequent phases.
>> 
>> -Yonik
>> http://www.lucidimagination.com
>>

Re: LuceneRevolution - NoSQL: A comparison

Posted by Dennis Gearon <ge...@sbcglobal.net>.

I think that's good thinking. I wonder, do the two phases have to be invoked externally by two queries, or why couldn't it be all self contained in each instance behind the load leveler?

Just curious how it works.

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Wed, 10/13/10, Yonik Seeley <yo...@lucidimagination.com> wrote:

> From: Yonik Seeley <yo...@lucidimagination.com>
> Subject: Re: LuceneRevolution - NoSQL: A comparison
> To: solr-user@lucene.apache.org
> Date: Wednesday, October 13, 2010, 5:46 AM
> On Tue, Oct 12, 2010 at 12:11 PM, Jan
> Høydahl / Cominvent
> <ja...@cominvent.com>
> wrote:
> > I'm pretty sure the 2nd phase to fetch doc-summaries
> goes directly to same server as first phase. But what if you
> stick a LB in between?
> 
> A related point - the load balancing implementation that's
> part of
> SolrCloud (and looks like it will be committed to trunk
> soon), does
> keep track of what server it used for the first phase and
> uses that
> for subsequent phases.
> 
> -Yonik
> http://www.lucidimagination.com
>

Re: LuceneRevolution - NoSQL: A comparison

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, Oct 12, 2010 at 12:11 PM, Jan Høydahl / Cominvent
<ja...@cominvent.com> wrote:
> I'm pretty sure the 2nd phase to fetch doc-summaries goes directly to same server as first phase. But what if you stick a LB in between?

A related point - the load balancing implementation that's part of
SolrCloud (and looks like it will be committed to trunk soon), does
keep track of what server it used for the first phase and uses that
for subsequent phases.

-Yonik
http://www.lucidimagination.com

Re: LuceneRevolution - NoSQL: A comparison

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Some very interesting scenarious cropping up here. Sounds to me like a minor architectural changes will be in order for them to be addressed.

What's the usual delay from stage one to stage 2. IF that were some kind of constant value, old versions could be kept around for approx doulbe that time. You'd need one of two things:

  A/ Last accessed by Stage one stored per record
~or~
  B/ A regular MVCC system and garbage collection.


Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/12/10, Jan Høydahl / Cominvent <ja...@cominvent.com> wrote:

> From: Jan Høydahl / Cominvent <ja...@cominvent.com>
> Subject: Re: LuceneRevolution - NoSQL: A comparison
> To: solr-user@lucene.apache.org
> Date: Tuesday, October 12, 2010, 9:11 AM
> This is a different issue. You are
> seeing the latency between master index update and
> replication to slave(s).
> Solve this by pointing your monitoring script directly to
> slave instead of master.
> 
> What this thread is about is a potential difference in
> state during the execution of a single sharded query, not
> due to master/slave but due to the index being updated
> between phase 1 and phase 2.
> 
> I'm pretty sure the 2nd phase to fetch doc-summaries goes
> directly to same server as first phase. But what if you
> stick a LB in between? Then perhaps the first phase may go
> to master and second to slave?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 12. okt. 2010, at 17.12, Shawn Heisey wrote:
> 
> > On 10/11/2010 6:32 PM, Peter Keegan wrote:
> >> When Solr does a distributed search across shards,
> it does this in 2 phases
> >> (correct me if I'm wrong):
> >> 
> >> 1. 1st query to get the docIds and facet counts
> >> 2. 2nd query to retrieve the stored fields of the
> top hits
> >> 
> >> The problem here is that the index could change
> between (1) and (2), so it's
> >> not an atomic transaction. If the stored fields
> were kept outside of Lucene,
> >> only the first query would be necessary. However,
> this would mean that the
> >> external NoSQL data store would have to be
> synchronized with the Lucene
> >> index, which might present its own problems. (I'm
> just throwing this out for
> >> discussion)
> > 
> > I've got a related issue that I have run into because
> of my use of a load balancer.
> > 
> > I have a total of seven shards, each of which has a
> replica.  I've got one set of machines set up as
> brokers that have the shards parameter in the standard
> request handler.  Queries are sent to the load
> balancer, which sends it to one of the brokers.  The
> shards parameter sends requests back to the load balancer to
> be ultimately sent to an actual server.
> > 
> > I have a monitoring script that retrieves the latest
> document and alarms if it's older than ten minutes. 
> Something that happens on occasion:
> > 
> > 1) An update is made to the master (happens every two
> minutes).
> > 2) Monitoring script requests newest document.
> > 3) Initial request is sent to master, finds ID.
> > 4) Second request is sent to the slave, document not
> found.
> > 5) Up to 15 seconds later, the slave replicates.
> > 
> > I solved this problem by having the monitoring script
> try several times on failure, waiting a few seconds on each
> loop.  Do I need to be terribly concerned about this
> impacting real queries?
> > 
> > I do not actually need to load balance, I have slave
> servers purely for failover.  Currently the load
> balancer has a 3 to 1 weight ratio favoring the slaves,
> which I plan to increase.  At one time I had the master
> set up as a backup rather than a lower weight target, but
> haproxy seemed to take longer to recover from failures in
> that mode.  I will have to do some more comprehensive
> testing.  If there's a better solution than haproxy
> that works with heartbeat, I can change that.
> > 
> > Thanks,
> > Shawn
> > 
> 
>

Re: LuceneRevolution - NoSQL: A comparison

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.

This is a different issue. You are seeing the latency between master index update and replication to slave(s).
Solve this by pointing your monitoring script directly to slave instead of master.

What this thread is about is a potential difference in state during the execution of a single sharded query, not due to master/slave but due to the index being updated between phase 1 and phase 2.

I'm pretty sure the 2nd phase to fetch doc-summaries goes directly to same server as first phase. But what if you stick a LB in between? Then perhaps the first phase may go to master and second to slave?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 12. okt. 2010, at 17.12, Shawn Heisey wrote:

> On 10/11/2010 6:32 PM, Peter Keegan wrote:
>> When Solr does a distributed search across shards, it does this in 2 phases
>> (correct me if I'm wrong):
>> 
>> 1. 1st query to get the docIds and facet counts
>> 2. 2nd query to retrieve the stored fields of the top hits
>> 
>> The problem here is that the index could change between (1) and (2), so it's
>> not an atomic transaction. If the stored fields were kept outside of Lucene,
>> only the first query would be necessary. However, this would mean that the
>> external NoSQL data store would have to be synchronized with the Lucene
>> index, which might present its own problems. (I'm just throwing this out for
>> discussion)
> 
> I've got a related issue that I have run into because of my use of a load balancer.
> 
> I have a total of seven shards, each of which has a replica.  I've got one set of machines set up as brokers that have the shards parameter in the standard request handler.  Queries are sent to the load balancer, which sends it to one of the brokers.  The shards parameter sends requests back to the load balancer to be ultimately sent to an actual server.
> 
> I have a monitoring script that retrieves the latest document and alarms if it's older than ten minutes.  Something that happens on occasion:
> 
> 1) An update is made to the master (happens every two minutes).
> 2) Monitoring script requests newest document.
> 3) Initial request is sent to master, finds ID.
> 4) Second request is sent to the slave, document not found.
> 5) Up to 15 seconds later, the slave replicates.
> 
> I solved this problem by having the monitoring script try several times on failure, waiting a few seconds on each loop.  Do I need to be terribly concerned about this impacting real queries?
> 
> I do not actually need to load balance, I have slave servers purely for failover.  Currently the load balancer has a 3 to 1 weight ratio favoring the slaves, which I plan to increase.  At one time I had the master set up as a backup rather than a lower weight target, but haproxy seemed to take longer to recover from failures in that mode.  I will have to do some more comprehensive testing.  If there's a better solution than haproxy that works with heartbeat, I can change that.
> 
> Thanks,
> Shawn
>

Re: LuceneRevolution - NoSQL: A comparison

Posted by Shawn Heisey <so...@elyograg.org>.

  On 10/11/2010 6:32 PM, Peter Keegan wrote:
> When Solr does a distributed search across shards, it does this in 2 phases
> (correct me if I'm wrong):
>
> 1. 1st query to get the docIds and facet counts
> 2. 2nd query to retrieve the stored fields of the top hits
>
> The problem here is that the index could change between (1) and (2), so it's
> not an atomic transaction. If the stored fields were kept outside of Lucene,
> only the first query would be necessary. However, this would mean that the
> external NoSQL data store would have to be synchronized with the Lucene
> index, which might present its own problems. (I'm just throwing this out for
> discussion)

I've got a related issue that I have run into because of my use of a 
load balancer.

I have a total of seven shards, each of which has a replica.  I've got 
one set of machines set up as brokers that have the shards parameter in 
the standard request handler.  Queries are sent to the load balancer, 
which sends it to one of the brokers.  The shards parameter sends 
requests back to the load balancer to be ultimately sent to an actual 
server.

I have a monitoring script that retrieves the latest document and alarms 
if it's older than ten minutes.  Something that happens on occasion:

1) An update is made to the master (happens every two minutes).
2) Monitoring script requests newest document.
3) Initial request is sent to master, finds ID.
4) Second request is sent to the slave, document not found.
5) Up to 15 seconds later, the slave replicates.

I solved this problem by having the monitoring script try several times 
on failure, waiting a few seconds on each loop.  Do I need to be 
terribly concerned about this impacting real queries?

I do not actually need to load balance, I have slave servers purely for 
failover.  Currently the load balancer has a 3 to 1 weight ratio 
favoring the slaves, which I plan to increase.  At one time I had the 
master set up as a backup rather than a lower weight target, but haproxy 
seemed to take longer to recover from failures in that mode.  I will 
have to do some more comprehensive testing.  If there's a better 
solution than haproxy that works with heartbeat, I can change that.

Thanks,
Shawn

Re: LuceneRevolution - NoSQL: A comparison

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.

This is what FAST does in ESP. When a new version of a partition is built, it is staged in its own process and co-exists alongside the old one. The query-dispatcher sees both and routes traffic based on requested "generation id".

Should probably not invest in such a feature until there's a clear demand in form of in-the-field bug reports.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 12. okt. 2010, at 04.20, Yonik Seeley wrote:

> On Mon, Oct 11, 2010 at 8:32 PM, Peter Keegan <pe...@gmail.com> wrote:
>> I listened with great interest to Grant's presentation of the NoSQL
>> comparisons/alternatives to Solr/Lucene. It sounds like the jury is still
>> out on much of this. Here's a use case that might favor using a NoSQL
>> alternative for storing 'stored fields' outside of Lucene.
>> 
>> When Solr does a distributed search across shards, it does this in 2 phases
>> (correct me if I'm wrong):
>> 
>> 1. 1st query to get the docIds and facet counts
>> 2. 2nd query to retrieve the stored fields of the top hits
>> 
>> The problem here is that the index could change between (1) and (2), so it's
>> not an atomic transaction.
> 
> Yep.
> 
> As I discussed with Peter at Lucene Revolution, if this feature is
> important to people, I think the easiest way to solve it would be via
> leases.
> 
> During a query, a client could request a lease for a certain amount of
> time on whatever index version is used to generate the response.  Solr
> would then return the index version to the client along with the
> response, and keep the index open for that amount of time.  The client
> could make consistent additional requests (such as the 2nd phase of a
> distributed request)  by requesting the same version of the index.
> 
> -Yonik

Re: LuceneRevolution - NoSQL: A comparison

Posted by Dennis Gearon <ge...@sbcglobal.net>.

It sounds, of course, a lot like transaction isolation using MVCC. It's the obvious solution, and has been for since the late 1970's.

I hope it won't be too hard to convince people to use it :-) It's been the reason for the early success of Oracle.

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Mon, 10/11/10, Yonik Seeley <yo...@lucidimagination.com> wrote:

> From: Yonik Seeley <yo...@lucidimagination.com>
> Subject: Re: LuceneRevolution - NoSQL: A comparison
> To: solr-user@lucene.apache.org
> Date: Monday, October 11, 2010, 7:20 PM
> On Mon, Oct 11, 2010 at 8:32 PM,
> Peter Keegan <pe...@gmail.com>
> wrote:
> > I listened with great interest to Grant's presentation
> of the NoSQL
> > comparisons/alternatives to Solr/Lucene. It sounds
> like the jury is still
> > out on much of this. Here's a use case that might
> favor using a NoSQL
> > alternative for storing 'stored fields' outside of
> Lucene.
> >
> > When Solr does a distributed search across shards, it
> does this in 2 phases
> > (correct me if I'm wrong):
> >
> > 1. 1st query to get the docIds and facet counts
> > 2. 2nd query to retrieve the stored fields of the top
> hits
> >
> > The problem here is that the index could change
> between (1) and (2), so it's
> > not an atomic transaction.
> 
> Yep.
> 
> As I discussed with Peter at Lucene Revolution, if this
> feature is
> important to people, I think the easiest way to solve it
> would be via
> leases.
> 
> During a query, a client could request a lease for a
> certain amount of
> time on whatever index version is used to generate the
> response.  Solr
> would then return the index version to the client along
> with the
> response, and keep the index open for that amount of
> time.  The client
> could make consistent additional requests (such as the 2nd
> phase of a
> distributed request)  by requesting the same version
> of the index.
> 
> -Yonik
>

Re: LuceneRevolution - NoSQL: A comparison

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, Oct 11, 2010 at 8:32 PM, Peter Keegan <pe...@gmail.com> wrote:
> I listened with great interest to Grant's presentation of the NoSQL
> comparisons/alternatives to Solr/Lucene. It sounds like the jury is still
> out on much of this. Here's a use case that might favor using a NoSQL
> alternative for storing 'stored fields' outside of Lucene.
>
> When Solr does a distributed search across shards, it does this in 2 phases
> (correct me if I'm wrong):
>
> 1. 1st query to get the docIds and facet counts
> 2. 2nd query to retrieve the stored fields of the top hits
>
> The problem here is that the index could change between (1) and (2), so it's
> not an atomic transaction.

Yep.

As I discussed with Peter at Lucene Revolution, if this feature is
important to people, I think the easiest way to solve it would be via
leases.

During a query, a client could request a lease for a certain amount of
time on whatever index version is used to generate the response.  Solr
would then return the index version to the client along with the
response, and keep the index open for that amount of time.  The client
could make consistent additional requests (such as the 2nd phase of a
distributed request)  by requesting the same version of the index.

-Yonik

Re: LuceneRevolution - NoSQL: A comparison

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Well,
     I think that if some is searching the 'whole of the dataset' to find the 'individual data' then an SQL database outside of Solr makes as much sense. There's plenty of data in the world or most applications that needs to stay normalized or at least has benefits to being that way.
Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Mon, 10/11/10, Peter Keegan <pe...@gmail.com> wrote:

> From: Peter Keegan <pe...@gmail.com>
> Subject: LuceneRevolution - NoSQL: A comparison
> To: solr-user@lucene.apache.org
> Date: Monday, October 11, 2010, 5:32 PM
> I listened with great interest to
> Grant's presentation of the NoSQL
> comparisons/alternatives to Solr/Lucene. It sounds like the
> jury is still
> out on much of this. Here's a use case that might favor
> using a NoSQL
> alternative for storing 'stored fields' outside of Lucene.
> 
> When Solr does a distributed search across shards, it does
> this in 2 phases
> (correct me if I'm wrong):
> 
> 1. 1st query to get the docIds and facet counts
> 2. 2nd query to retrieve the stored fields of the top hits
> 
> The problem here is that the index could change between (1)
> and (2), so it's
> not an atomic transaction. If the stored fields were kept
> outside of Lucene,
> only the first query would be necessary. However, this
> would mean that the
> external NoSQL data store would have to be synchronized
> with the Lucene
> index, which might present its own problems. (I'm just
> throwing this out for
> discussion)
> 
> Peter
>