You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Tim Patton <tp...@dealcatcher.com> on 2007/03/05 16:47:36 UTC

Re: Federated Search


Venkatesh Seetharam wrote:
> Hi Tim,
> 
> Howdy. I saw your post on Solr newsgroup and caught my attention. I'm 
> working on a similar problem for searching a vault of over 100 million 
> XML documents. I already have the encoding part done using Hadoop and 
> Lucene. It works like a  charm. I create N index partitions and have 
> been trying to wrap Solr to search each partition, have a Search broker 
> that merges the results and returns.
> 
> I'm curious about how have you solved the distribution of additions, 
> deletions and updates to each of the indexing servers.I use a 
> partitioner based on a hash of the document id. Do you broadcast to the 
> slaves as to who owns a document?
> 
> Also, I'm looking at Hadoop RPC and ICE ( www.zeroc.com 
> <http://www.zeroc.com>) for distributing the search across these Solr 
> servers. I'm not using HTTP.
> 
> Any ideas are greatly appreciated.
> 
> PS: I did subscribe to solr newsgroup now but  did not receive a 
> confirmation and hence sending it to you directly.
> 
> -- 
> Thanks,
> Venkatesh
> 
> "Perfection (in design) is achieved not when there is nothing more to 
> add, but rather when there is nothing more to take away."
> - Antoine de Saint-Exupéry


I used a SQL database to keep track of which server had which document. 
    Then I originally used JMS and would use a selector for which server 
number the document should go to.  I switched over to a home grown, 
lightweight message server since JMS behaves really badly when it backs 
up and I couldn't find a server that would simply pause the producers if 
there was a problem with the consumers.  Additions are pretty much 
assigned randomly to whichever server gets them first.  At this point I 
am up to around 20 million documents.

The hash idea sounds really interesting and if I had a fixed number of 
indexes it would be perfect.  But I don't know how big the index will 
grow and I wanted to be able to add servers at any point.  I would like 
to eliminate any outside dependencies (SQL, JMS), which is why a 
distributed Solr would let me focus on other areas.

How did you work around not being able to update a lucene index that is 
stored in Hadoop?  I know there were changes in Lucene 2.1 to support 
this but I haven't looked that far into it yet, I've just been testing 
the new IndexWriter.  As an aside, I hope those features can be used by 
Solr soon (if they aren't already in the nightlys).

Tim

Re: SPAM-LOW: Re: Federated Search

Posted by Tim Patton <tp...@dealcatcher.com>.

I have several indexes now (4 at the moment, 20gb each, and I want to be 
able to drop in a new machine easily).  I'm using SQL server as a DB and 
it scales well.  The DB doesn't get hit too hard, mostly doing location 
lookups, and the app does some checking to make sure a document  has 
really changed before updating that back in the DB or the index.  When a 
new server is added it randomly picks up additions from the message 
server (it's approximately round-robin) and the rest of the system 
really doesn't even need to know about it.

I've realized partitioned indexing is a difficult, but solvable problem. 
  It could be a big project though.  I mean we have all solved it in our 
own way but no one has a general solution.  Distributed searching might 
be a better area to add to Solr since that should basically be the same 
for everyone.  I'm going to mess around with Jini on my own indexes, 
there's finally a new book out to go with the newer versions.

How were you planning on using Solr with Hadoop?  Maybe I don't fully 
understand how hadoop works.

Tim

Venkatesh Seetharam wrote:
> Hi Tim,
> 
> Thanks for your response. Interesting idea. Does the DB scale?  Do you have
> one single index which you plan to use Solr for or you have multiple
> indexes?
> 
>> But I don't know how big the index will grow and I wanted to be able to
> add servers at any point.
> I'm thinking of having N partitions with a max of 10 million documents per
> partition. Adding a server should not be a problem but the newly added
> server would take time to grow so that distribution of documents are equal
> in the cluster. I've tested with 50 million documents of 10 size each and
> looks very promising.
> 
>> The hash idea sounds really interesting and if I had a fixed number of
> indexes it would be perfect.
> I'm infact looking around for a reverse-hash algorithm where in given a
> docId, I should be able to find which partition contains the document so I
> can save cycles on broadcasting slaves.
> 
> I mean, even if you use a DB, how have you solved the problem of
> distribution when a new server is added into the mix.
> 
> We have the same problem since we get daily updates to documents and
> document metadata.
> 
>> How did you work around not being able to update a lucene index that is
> stored in Hadoop?
> I do not use HDFS. I use a NetApp mounted on all the nodes in the cluster
> and hence did not need any change to Lucene.
> 
> I plan to index using Lucene/Hadoop and use Solr as the partition searcher
> and a broker which would merge the results and return 'em.
> 
> Thanks,
> Venkatesh
> 
> On 3/5/07, Tim Patton <tp...@dealcatcher.com> wrote:
>>
>>
>>
>> Venkatesh Seetharam wrote:
>> > Hi Tim,
>> >
>> > Howdy. I saw your post on Solr newsgroup and caught my attention. I'm
>> > working on a similar problem for searching a vault of over 100 million
>> > XML documents. I already have the encoding part done using Hadoop and
>> > Lucene. It works like a  charm. I create N index partitions and have
>> > been trying to wrap Solr to search each partition, have a Search broker
>> > that merges the results and returns.
>> >
>> > I'm curious about how have you solved the distribution of additions,
>> > deletions and updates to each of the indexing servers.I use a
>> > partitioner based on a hash of the document id. Do you broadcast to the
>> > slaves as to who owns a document?
>> >
>> > Also, I'm looking at Hadoop RPC and ICE ( www.zeroc.com
>> > <http://www.zeroc.com>) for distributing the search across these Solr
>> > servers. I'm not using HTTP.
>> >
>> > Any ideas are greatly appreciated.
>> >
>> > PS: I did subscribe to solr newsgroup now but  did not receive a
>> > confirmation and hence sending it to you directly.
>> >
>> > --
>> > Thanks,
>> > Venkatesh
>> >
>> > "Perfection (in design) is achieved not when there is nothing more to
>> > add, but rather when there is nothing more to take away."
>> > - Antoine de Saint-Exupéry
>>
>>
>> I used a SQL database to keep track of which server had which document.
>>     Then I originally used JMS and would use a selector for which server
>> number the document should go to.  I switched over to a home grown,
>> lightweight message server since JMS behaves really badly when it backs
>> up and I couldn't find a server that would simply pause the producers if
>> there was a problem with the consumers.  Additions are pretty much
>> assigned randomly to whichever server gets them first.  At this point I
>> am up to around 20 million documents.
>>
>> The hash idea sounds really interesting and if I had a fixed number of
>> indexes it would be perfect.  But I don't know how big the index will
>> grow and I wanted to be able to add servers at any point.  I would like
>> to eliminate any outside dependencies (SQL, JMS), which is why a
>> distributed Solr would let me focus on other areas.
>>
>> How did you work around not being able to update a lucene index that is
>> stored in Hadoop?  I know there were changes in Lucene 2.1 to support
>> this but I haven't looked that far into it yet, I've just been testing
>> the new IndexWriter.  As an aside, I hope those features can be used by
>> Solr soon (if they aren't already in the nightlys).
>>
>> Tim
>>
>>
>

Re: Federated Search

Posted by Venkatesh Seetharam <vs...@gmail.com>.

Hi Jed,

Thanks for sharing your thoughts and the link.

Venkatesh

On 3/11/07, Jed Reynolds <li...@benrey.is-a-geek.net> wrote:
>
>     Venkatesh Seetharam wrote:
> >
> >> The hash idea sounds really interesting and if I had a fixed number of
> > indexes it would be perfect.
> > I'm infact looking around for a reverse-hash algorithm where in given a
> > docId, I should be able to find which partition contains the document
> > so I
> > can save cycles on broadcasting slaves.
>
> Many large databases partition their data either by load or by another
> logical manner, like by alphabet. I hear that Hotmail, for instance,
> partitions its users alphabetically. Having a broker will certainly
> abstract this mechninism, and of course your application(s) want to be
> able to bypass a broker when necessary.
>
> > I mean, even if you use a DB, how have you solved the problem of
> > distribution when a new server is added into the mix.
>
> http://www8.org/w8-papers/2a-webserver/caching/paper2.html
>
> I saw this link on the memcached list and the thread surrounding it
> certainly covered some similar ground. Some ideas have been discussed
> like:
> - high availability of memcached, redundant entries
> - scaling out clusters and facing the need to rebuild the entire cache
> on all nodes depending on your bucketing.
> I see some similarties with maintaining multiple indicies/lucene
> partitions and having a memcache deployment: mostly if you are hashing
> your keys to partitions (or buckets or machines) then you might be faced
> with a) availability issues if there's a machine/partition outtage b)
> rebuilding partitions if adding a partition/bucket changes the hash
> mapping.
>
> The ways I can think of to scale-out new indexes would be to have your
> application maintain two sets of bucket mappings for ids to indexes, and
> the second would be to key your documents and partition them by date.
> The former method would allow you to rebuild a second set of
> repartitioned indexes and buckets and allow you to update your
> application to use the new bucket mapping (when all the indexes has been
> rebuilt). The latter method would only apply if you could organize your
> document ids by date and only added new documents to the 'now' end or
> evenly across most dates. You'd have to add a new partition onto the end
> as time progressed, and rarely rebuild old indexes unless your documents
> grow unevenly.
>
> Interesting topic! I don't yet need to run multiple Lucene partitions,
> but I have a few memcached servers and increasing the number of them I
> expect will force my site to take a performance accordingly as I am
> forced to rebuild the caches. I can see similarly if I had multiple
> lucene partitions, that if I had to fission some of them, rebuilding the
> resulting partitions would be time intensive and I'd want to have
> procedures in place for availibility, scaling out and changing
> application code as necessary. Just having one fail-over Solr index is
> just so easy in comparison.
>
> Jed
>

Re: Federated Search

Posted by Jed Reynolds <li...@benrey.is-a-geek.net>.

    Venkatesh Seetharam wrote:
>
>> The hash idea sounds really interesting and if I had a fixed number of
> indexes it would be perfect.
> I'm infact looking around for a reverse-hash algorithm where in given a
> docId, I should be able to find which partition contains the document 
> so I
> can save cycles on broadcasting slaves.

Many large databases partition their data either by load or by another 
logical manner, like by alphabet. I hear that Hotmail, for instance, 
partitions its users alphabetically. Having a broker will certainly 
abstract this mechninism, and of course your application(s) want to be 
able to bypass a broker when necessary.

> I mean, even if you use a DB, how have you solved the problem of
> distribution when a new server is added into the mix.

http://www8.org/w8-papers/2a-webserver/caching/paper2.html

I saw this link on the memcached list and the thread surrounding it 
certainly covered some similar ground. Some ideas have been discussed like:
- high availability of memcached, redundant entries
- scaling out clusters and facing the need to rebuild the entire cache 
on all nodes depending on your bucketing.
I see some similarties with maintaining multiple indicies/lucene 
partitions and having a memcache deployment: mostly if you are hashing 
your keys to partitions (or buckets or machines) then you might be faced 
with a) availability issues if there's a machine/partition outtage b) 
rebuilding partitions if adding a partition/bucket changes the hash mapping.

The ways I can think of to scale-out new indexes would be to have your 
application maintain two sets of bucket mappings for ids to indexes, and 
the second would be to key your documents and partition them by date. 
The former method would allow you to rebuild a second set of 
repartitioned indexes and buckets and allow you to update your 
application to use the new bucket mapping (when all the indexes has been 
rebuilt). The latter method would only apply if you could organize your 
document ids by date and only added new documents to the 'now' end or 
evenly across most dates. You'd have to add a new partition onto the end 
as time progressed, and rarely rebuild old indexes unless your documents 
grow unevenly.

Interesting topic! I don't yet need to run multiple Lucene partitions, 
but I have a few memcached servers and increasing the number of them I 
expect will force my site to take a performance accordingly as I am 
forced to rebuild the caches. I can see similarly if I had multiple 
lucene partitions, that if I had to fission some of them, rebuilding the 
resulting partitions would be time intensive and I'd want to have 
procedures in place for availibility, scaling out and changing 
application code as necessary. Just having one fail-over Solr index is 
just so easy in comparison.

Jed

Re: Federated Search

Posted by Venkatesh Seetharam <vs...@gmail.com>.

Hi Tim,

Thanks for your response. Interesting idea. Does the DB scale?  Do you have
one single index which you plan to use Solr for or you have multiple
indexes?

> But I don't know how big the index will grow and I wanted to be able to
add servers at any point.
I'm thinking of having N partitions with a max of 10 million documents per
partition. Adding a server should not be a problem but the newly added
server would take time to grow so that distribution of documents are equal
in the cluster. I've tested with 50 million documents of 10 size each and
looks very promising.

> The hash idea sounds really interesting and if I had a fixed number of
indexes it would be perfect.
I'm infact looking around for a reverse-hash algorithm where in given a
docId, I should be able to find which partition contains the document so I
can save cycles on broadcasting slaves.

I mean, even if you use a DB, how have you solved the problem of
distribution when a new server is added into the mix.

We have the same problem since we get daily updates to documents and
document metadata.

> How did you work around not being able to update a lucene index that is
stored in Hadoop?
I do not use HDFS. I use a NetApp mounted on all the nodes in the cluster
and hence did not need any change to Lucene.

I plan to index using Lucene/Hadoop and use Solr as the partition searcher
and a broker which would merge the results and return 'em.

Thanks,
Venkatesh

On 3/5/07, Tim Patton <tp...@dealcatcher.com> wrote:
>
>
>
> Venkatesh Seetharam wrote:
> > Hi Tim,
> >
> > Howdy. I saw your post on Solr newsgroup and caught my attention. I'm
> > working on a similar problem for searching a vault of over 100 million
> > XML documents. I already have the encoding part done using Hadoop and
> > Lucene. It works like a  charm. I create N index partitions and have
> > been trying to wrap Solr to search each partition, have a Search broker
> > that merges the results and returns.
> >
> > I'm curious about how have you solved the distribution of additions,
> > deletions and updates to each of the indexing servers.I use a
> > partitioner based on a hash of the document id. Do you broadcast to the
> > slaves as to who owns a document?
> >
> > Also, I'm looking at Hadoop RPC and ICE ( www.zeroc.com
> > <http://www.zeroc.com>) for distributing the search across these Solr
> > servers. I'm not using HTTP.
> >
> > Any ideas are greatly appreciated.
> >
> > PS: I did subscribe to solr newsgroup now but  did not receive a
> > confirmation and hence sending it to you directly.
> >
> > --
> > Thanks,
> > Venkatesh
> >
> > "Perfection (in design) is achieved not when there is nothing more to
> > add, but rather when there is nothing more to take away."
> > - Antoine de Saint-Exupéry
>
>
> I used a SQL database to keep track of which server had which document.
>     Then I originally used JMS and would use a selector for which server
> number the document should go to.  I switched over to a home grown,
> lightweight message server since JMS behaves really badly when it backs
> up and I couldn't find a server that would simply pause the producers if
> there was a problem with the consumers.  Additions are pretty much
> assigned randomly to whichever server gets them first.  At this point I
> am up to around 20 million documents.
>
> The hash idea sounds really interesting and if I had a fixed number of
> indexes it would be perfect.  But I don't know how big the index will
> grow and I wanted to be able to add servers at any point.  I would like
> to eliminate any outside dependencies (SQL, JMS), which is why a
> distributed Solr would let me focus on other areas.
>
> How did you work around not being able to update a lucene index that is
> stored in Hadoop?  I know there were changes in Lucene 2.1 to support
> this but I haven't looked that far into it yet, I've just been testing
> the new IndexWriter.  As an aside, I hope those features can be used by
> Solr soon (if they aren't already in the nightlys).
>
> Tim
>
>

Re: Re[2]: Federated Search

Posted by Venkatesh Seetharam <vs...@gmail.com>.

Hi Jack,

Howdy. Comments are inline.

> is there any reason you don't want to use HTTP?
I've seen that Hadoop RPC is faster then HTTP. Also, Since Solr returns XML
response, you incur overhead in parsing that and then merging. I havent sone
scale testing with HTTP and XML response.

> Do the searchers need to know who has what document?
This is necessary if you are doing updates to the document in the index.

> I suppose solr is ok to handle 20 million document
Storage is not an issue. If the size of the index is huge, then it will take
time and when you want 100 searches/second, its really hard. I've read in
Lucene newsgroup that lucene works well with an index around 8-10GB. It
slows down when its bigger than that. Since my index can run into many GB,
I'd partition that.

> - If a hash value-based partitioning is used, re-indexing all  the
document will be needed.
Why is that necessary? If a document has to be updated, you can broadcast to
slaves as to who owns it and then send an update to that slave.

Venkatesh

On 3/5/07, Jack L <jl...@yahoo.ca> wrote:
>
> This is very interesting discussion. I have a few question while
> reading Tim and Venkatesh's email:
>
> To Tim:
> 1. is there any reason you don't want to use HTTP? Since solr has
>    an HTTP interface already, I suppose using HTTP is the simplest
>    way to communicate the solr servers from the merger/search broker.
>    hadoop and ice would both require some additional work - this is
>    if you are using solr and not lucent directly.
>
> 2. "Do you broadcast to the slaves as to who owns a document?"
>    Do the searchers need to know who has what document?
>
> To Venkatesh:
> 1. I suppose solr is ok to handle 20 million document - I hope I'm
>    right because that's what I'm planning on doing :) Is it because
>    of storage capacity why you you choose to use multiple solr
>    servers?
>
> An open question: what's the best way to manage server addition?
> - If a hash value-based partitioning is used, re-indexing all
>   the document will be needed.
> - Otherwise, a database seems to be required to track the documents.
>
> --
> Best regards,
> Jack
>
> Monday, March 5, 2007, 7:47:36 AM, you wrote:
>
>
>
> > Venkatesh Seetharam wrote:
> >> Hi Tim,
> >>
> >> Howdy. I saw your post on Solr newsgroup and caught my attention. I'm
> >> working on a similar problem for searching a vault of over 100 million
> >> XML documents. I already have the encoding part done using Hadoop and
> >> Lucene. It works like a  charm. I create N index partitions and have
> >> been trying to wrap Solr to search each partition, have a Search broker
> >> that merges the results and returns.
> >>
> >> I'm curious about how have you solved the distribution of additions,
> >> deletions and updates to each of the indexing servers.I use a
> >> partitioner based on a hash of the document id. Do you broadcast to the
>
> >> slaves as to who owns a document?
> >>
> >> Also, I'm looking at Hadoop RPC and ICE ( www.zeroc.com
> >> <http://www.zeroc.com >) for distributing the search across these Solr
> >> servers. I'm not using HTTP.
> >>
> >> Any ideas are greatly appreciated.
> >>
> >> PS: I did subscribe to solr newsgroup now but  did not receive a
> >> confirmation and hence sending it to you directly.
> >>
> >> --
> >> Thanks,
> >> Venkatesh
> >>
> >> "Perfection (in design) is achieved not when there is nothing more to
> >> add, but rather when there is nothing more to take away."
> >> - Antoine de Saint-Exupéry
>
>
> > I used a SQL database to keep track of which server had which document.
> >     Then I originally used JMS and would use a selector for which server
>
> > number the document should go to.  I switched over to a home grown,
> > lightweight message server since JMS behaves really badly when it backs
> > up and I couldn't find a server that would simply pause the producers if
>
> > there was a problem with the consumers.  Additions are pretty much
> > assigned randomly to whichever server gets them first.  At this point I
> > am up to around 20 million documents.
>
> > The hash idea sounds really interesting and if I had a fixed number of
> > indexes it would be perfect.  But I don't know how big the index will
> > grow and I wanted to be able to add servers at any point.  I would like
> > to eliminate any outside dependencies (SQL, JMS), which is why a
> > distributed Solr would let me focus on other areas.
>
> > How did you work around not being able to update a lucene index that is
> > stored in Hadoop?  I know there were changes in Lucene 2.1 to support
> > this but I haven't looked that far into it yet, I've just been testing
> > the new IndexWriter.  As an aside, I hope those features can be used by
> > Solr soon (if they aren't already in the nightlys).
>
> > Tim
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>

Re[2]: Federated Search

Posted by Tim Patton <tp...@dealcatcher.com>.

Jack L wrote:
> This is very interesting discussion. I have a few question while
> reading Tim and Venkatesh's email:
> 
> To Tim:
> 1. is there any reason you don't want to use HTTP? Since solr has
>    an HTTP interface already, I suppose using HTTP is the simplest
>    way to communicate the solr servers from the merger/search broker.
>    hadoop and ice would both require some additional work - this is
>    if you are using solr and not lucent directly.
> 
> 2. "Do you broadcast to the slaves as to who owns a document?"
>    Do the searchers need to know who has what document?
>    
> To Venkatesh:
> 1. I suppose solr is ok to handle 20 million document - I hope I'm
>    right because that's what I'm planning on doing :) Is it because
>    of storage capacity why you you choose to use multiple solr
>    servers?
> 
> An open question: what's the best way to manage server addition?
> - If a hash value-based partitioning is used, re-indexing all
>   the document will be needed.
> - Otherwise, a database seems to be required to track the documents.
> 

Jack,

My big stumbling blocks were with indexing more so than searching.  I 
did put together an RMI based system to search multiple lucene servers. 
  And the searchers don't need to know where everything is.  However 
with indexing at some point something needs to know where to send the 
documents for updating or who to tell to delete a document, whether it 
is the server that does the processing or some sort of broker.   The 
processing machines could do the DB look up and talk to Solr over HTTP 
no problem and this is part of what I am considering doing.  However I 
have some extra code on the indexing machines to handle DB updates 
etc..., though I might find a way to move this elsewhere in the system 
so I can have pretty much a pure solr server with just a few custom 
items (like my own Similarity or QueryParser).

I suppose the DB could be moved to lucene from SQL in the future as well.

Re[2]: Federated Search

Posted by Jack L <jl...@yahoo.ca>.

This is very interesting discussion. I have a few question while
reading Tim and Venkatesh's email:

To Tim:
1. is there any reason you don't want to use HTTP? Since solr has
   an HTTP interface already, I suppose using HTTP is the simplest
   way to communicate the solr servers from the merger/search broker.
   hadoop and ice would both require some additional work - this is
   if you are using solr and not lucent directly.

2. "Do you broadcast to the slaves as to who owns a document?"
   Do the searchers need to know who has what document?
   
To Venkatesh:
1. I suppose solr is ok to handle 20 million document - I hope I'm
   right because that's what I'm planning on doing :) Is it because
   of storage capacity why you you choose to use multiple solr
   servers?

An open question: what's the best way to manage server addition?
- If a hash value-based partitioning is used, re-indexing all
  the document will be needed.
- Otherwise, a database seems to be required to track the documents.

-- 
Best regards,
Jack

Monday, March 5, 2007, 7:47:36 AM, you wrote:



> Venkatesh Seetharam wrote:
>> Hi Tim,
>> 
>> Howdy. I saw your post on Solr newsgroup and caught my attention. I'm
>> working on a similar problem for searching a vault of over 100 million
>> XML documents. I already have the encoding part done using Hadoop and
>> Lucene. It works like a  charm. I create N index partitions and have
>> been trying to wrap Solr to search each partition, have a Search broker
>> that merges the results and returns.
>> 
>> I'm curious about how have you solved the distribution of additions,
>> deletions and updates to each of the indexing servers.I use a 
>> partitioner based on a hash of the document id. Do you broadcast to the
>> slaves as to who owns a document?
>> 
>> Also, I'm looking at Hadoop RPC and ICE ( www.zeroc.com 
>> <http://www.zeroc.com>) for distributing the search across these Solr
>> servers. I'm not using HTTP.
>> 
>> Any ideas are greatly appreciated.
>> 
>> PS: I did subscribe to solr newsgroup now but  did not receive a 
>> confirmation and hence sending it to you directly.
>> 
>> -- 
>> Thanks,
>> Venkatesh
>> 
>> "Perfection (in design) is achieved not when there is nothing more to
>> add, but rather when there is nothing more to take away."
>> - Antoine de Saint-Exupéry


> I used a SQL database to keep track of which server had which document.
>     Then I originally used JMS and would use a selector for which server
> number the document should go to.  I switched over to a home grown, 
> lightweight message server since JMS behaves really badly when it backs
> up and I couldn't find a server that would simply pause the producers if
> there was a problem with the consumers.  Additions are pretty much 
> assigned randomly to whichever server gets them first.  At this point I
> am up to around 20 million documents.

> The hash idea sounds really interesting and if I had a fixed number of
> indexes it would be perfect.  But I don't know how big the index will
> grow and I wanted to be able to add servers at any point.  I would like
> to eliminate any outside dependencies (SQL, JMS), which is why a 
> distributed Solr would let me focus on other areas.

> How did you work around not being able to update a lucene index that is
> stored in Hadoop?  I know there were changes in Lucene 2.1 to support
> this but I haven't looked that far into it yet, I've just been testing
> the new IndexWriter.  As an aside, I hope those features can be used by
> Solr soon (if they aren't already in the nightlys).

> Tim

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com