You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mike Austin <ma...@gmail.com> on 2007/04/26 01:14:07 UTC

Solr index updating pattern

Could someone give advise on a better way to do this?

I have an index of many merchants and each day I delete merchant products
and re-update my database. After doing this I than re-create the entire
index and move it to production replacing the current index.

I was thinking about updating the index in realtime with only products that
need updated. My concern is that I might be updating 2 million products,
deleting 1 million, and inserting another 1-2 million all in one process. I
guess I could send batches of files to be sucked in and processed but it's
just not as clean as just creating a new index. Do you see an issue with
these massive updates, deletes, and inserts in solr? The problem now is that
I might just be updating 1/2 or 1/4 of the index and I don't need to
re-re-create the entire index again.

What do some of you keep your index updated?  I'm running it off of windows
server so I haven't even looked into the snappuller etc.. stuff.

Thanks,
Mike


Re: Solr index updating pattern

Posted by Yonik Seeley <yo...@apache.org>.
On 4/26/07, Mike Klaas <mi...@gmail.com> wrote:
> On 4/25/07, Mike Austin <ma...@gmail.com> wrote:
> > Could someone give advise on a better way to do this?
> >
> > I have an index of many merchants and each day I delete merchant products
> > and re-update my database. After doing this I than re-create the entire
> > index and move it to production replacing the current index.
> >
> > I was thinking about updating the index in realtime with only products that
> > need updated. My concern is that I might be updating 2 million products,
> > deleting 1 million, and inserting another 1-2 million all in one process. I
> > guess I could send batches of files to be sucked in and processed but it's
> > just not as clean as just creating a new index. Do you see an issue with
> > these massive updates, deletes, and inserts in solr? The problem now is that
> > I might just be updating 1/2 or 1/4 of the index and I don't need to
> > re-re-create the entire index again.
>
> There isn't necessarily an issue, but there is definitely some
> overhead in updating/deleting docs compared with simply writing new
> docs.  I've found that re-writing an entire index (ie. updating every
> document) is about twice as slow as wiping the index first.

I wish there were a programmatic way to wipe out the index, but due to
platforms like Windows, it's not really possible.  Perhaps Lucene
needs this feature...
Due to the new index format, it would be relatively easy to write a
new segments file that simply dropped all of the existing segments.
Cleanup of the old segments could happen as it does now.

-Yonik

Re: Solr index updating pattern

Posted by Mike Klaas <mi...@gmail.com>.
On 4/25/07, Mike Austin <ma...@gmail.com> wrote:
> Could someone give advise on a better way to do this?
>
> I have an index of many merchants and each day I delete merchant products
> and re-update my database. After doing this I than re-create the entire
> index and move it to production replacing the current index.
>
> I was thinking about updating the index in realtime with only products that
> need updated. My concern is that I might be updating 2 million products,
> deleting 1 million, and inserting another 1-2 million all in one process. I
> guess I could send batches of files to be sucked in and processed but it's
> just not as clean as just creating a new index. Do you see an issue with
> these massive updates, deletes, and inserts in solr? The problem now is that
> I might just be updating 1/2 or 1/4 of the index and I don't need to
> re-re-create the entire index again.

There isn't necessarily an issue, but there is definitely some
overhead in updating/deleting docs compared with simply writing new
docs.  I've found that re-writing an entire index (ie. updating every
document) is about twice as slow as wiping the index first.

-Mike