You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Daniel Shane <sh...@LEXUM.UMontreal.CA> on 2010/02/16 20:29:30 UTC
Preventing mass index delete via DataImportHandler full-import
I've setup a simple DIH import handler with Solr that connects via a database to my data.
I have a small worry though. When I call the full-import functions, can I configure Solr (via the XML files) to make sure there are rows to index before wiping everything? What worries me is if, for some unknown reason, we have an empty database, then the full-import will just wipe the live index and the search will be broken.
I don't think its possible, but I'm new to Solr so its quite possible I've overlooked how this could be done.
Thanks in advance for any help!
Daniel Shane
Re: Preventing mass index delete via DataImportHandler full-import
Posted by Chris Hostetter <ho...@fucit.org>.
: Thats what I thought. I think I'll take the time to add something to the
: DIH to prevent such things. Maybe a parameter that will cause the import
: to bail out if the documents to index are less than X % of the total
: number of documents already in the index.
the devils in the details though ... to do an efficient "full-import" DIH
deletes hte index before it starts indexing anything, and for an
arbitrary datasource with an arbitrary set of entities and sub entities
and various layers of logic it seems like it would be infeasible to know
how many rows you are going to get before you actually start.
I think this sort of thing would pretty much have to be done post-import
(w/o doing the initial delete), counting the number of docs adding, and
deleting all of the ones older then that (using a deleteQuery based on a
timestamp field) if the number is above a percentage threshold.
Of course: none of this helps you with the possibility that you have
plenty of docs, but they all contain useless data (maybe some nested
entity query failed so you have no searchable text) ... logic for sanity
checking an index tends to be fairly domain specific.
-Hoss
Re: Preventing mass index delete via DataImportHandler full-import
Posted by Daniel Shane <sh...@LEXUM.UMontreal.CA>.
Thats what I thought. I think I'll take the time to add something to the DIH to prevent such things. Maybe a parameter that will cause the import to bail out if the documents to index are less than X % of the total number of documents already in the index.
There would also be a parameter to override this manually.
I think it would be a good safety precaution.
Daniel Shane
----- Original Message -----
From: "Noble Paul നോബിള് नोब्ळ्" <no...@corp.aol.com>
To: solr-user@lucene.apache.org
Sent: Wednesday, February 17, 2010 12:36:52 AM
Subject: Re: Preventing mass index delete via DataImportHandler full-import
On Wed, Feb 17, 2010 at 8:03 AM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : I have a small worry though. When I call the full-import functions, can
> : I configure Solr (via the XML files) to make sure there are rows to
> : index before wiping everything? What worries me is if, for some unknown
> : reason, we have an empty database, then the full-import will just wipe
> : the live index and the search will be broken.
>
> I believe if you set clear=false when doing the full-import, DIH won't
it is clean=false
or use command=import instead of command=full-import
> delete the entire index before it starts. it probably makes the
> full-import slower (most of the adds wind up being deletes followed by
> adds) but it should prevent you from having an empty index if something
> goes wrong with your DB.
>
> the big catch is you now have to be responsible for managing deletes
> (using the XmlUpdateRequestHandler) yourself ... this bug looks like it's
> goal is to make this easier to deal with (but i'd not really clear to
> me what "deletedPkQuery" is ... it doesnt' seem to be documented.
>
> https://issues.apache.org/jira/browse/SOLR-1168
>
>
>
> -Hoss
>
>
--
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com
Re: Preventing mass index delete via DataImportHandler full-import
Posted by Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>.
On Wed, Feb 17, 2010 at 8:03 AM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : I have a small worry though. When I call the full-import functions, can
> : I configure Solr (via the XML files) to make sure there are rows to
> : index before wiping everything? What worries me is if, for some unknown
> : reason, we have an empty database, then the full-import will just wipe
> : the live index and the search will be broken.
>
> I believe if you set clear=false when doing the full-import, DIH won't
it is clean=false
or use command=import instead of command=full-import
> delete the entire index before it starts. it probably makes the
> full-import slower (most of the adds wind up being deletes followed by
> adds) but it should prevent you from having an empty index if something
> goes wrong with your DB.
>
> the big catch is you now have to be responsible for managing deletes
> (using the XmlUpdateRequestHandler) yourself ... this bug looks like it's
> goal is to make this easier to deal with (but i'd not really clear to
> me what "deletedPkQuery" is ... it doesnt' seem to be documented.
>
> https://issues.apache.org/jira/browse/SOLR-1168
>
>
>
> -Hoss
>
>
--
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com
Re: Preventing mass index delete via DataImportHandler full-import
Posted by Chris Hostetter <ho...@fucit.org>.
: I have a small worry though. When I call the full-import functions, can
: I configure Solr (via the XML files) to make sure there are rows to
: index before wiping everything? What worries me is if, for some unknown
: reason, we have an empty database, then the full-import will just wipe
: the live index and the search will be broken.
I believe if you set clear=false when doing the full-import, DIH won't
delete the entire index before it starts. it probably makes the
full-import slower (most of the adds wind up being deletes followed by
adds) but it should prevent you from having an empty index if something
goes wrong with your DB.
the big catch is you now have to be responsible for managing deletes
(using the XmlUpdateRequestHandler) yourself ... this bug looks like it's
goal is to make this easier to deal with (but i'd not really clear to
me what "deletedPkQuery" is ... it doesnt' seem to be documented.
https://issues.apache.org/jira/browse/SOLR-1168
-Hoss