You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mohamed Parvez <pa...@gmail.com> on 2009/12/28 21:51:49 UTC

Remove the deleted docs from the Solr Index

I am using Solr 1.4 and DIH to build the index from a table.

I use full import once to create the index and then i keep using delta
import to update the index.

All woks fine as long a the table gets added with only new rows.

if there are some rows in the table that get deleted then the index does not
get updated and has stale data.

I have looked into the below options:

1] deletedPkQuery and deltaImportQuery :-
Both are of no use, as there is no way any DB SQL query can return the row
from the table, if the row is already deleted from the table.

2] preImportDeleteQuery and postImportDeleteQuery :-
Both are of no use, as they work only with full import. ( Also there is no
way any DB SQL query can return the row from the table, if the row
is already deleted from the table)

3] Creating a new field in the table, named delete, and setting it as true
and instead of actually deleting the row from the table:-
This is not possible as that table is a sort of getting updated from
external content management system, with news updates and has 1000 of new
rows added and 1000 deleted every hour. Cant keep the rows in the table when
when the content has expired.

Am I missing something?  How to keep the Index updated when the rows are
deleted from the table.


------

Re: Remove the deleted docs from the Solr Index

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Dec 29, 2009 at 3:03 AM, Mohamed Parvez <pa...@gmail.com> wrote:

> I have looked in the that thread earlier. But there is no option there for
> a
> solution from Solr side.
>
> I mean the two more options there are
> 1] Use database triggers instead of DIH to manage updating the index :-
> This out of question as we cant run 1000 odd triggers every hour to delete.
>
> 2] Some sort of ORM use its interception:-
> This is also out of question as the deletes happens form external system or
> directly on the database, not through our application.
>
>
> To Say in Short, Solr Should have something thing to keep the index synced
> with the database. As of now its one way street, updates rows, on DB will
> go
> to the index. Deleted rows in the DB, will not be deleted from the Index
>
>
How can Solr figure out what has been deleted? Should it go through each row
and comparing against each doc? Even then some things are not possible
(think indexed fields). It would be far efficient to just do a full-import
each time instead.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Remove the deleted docs from the Solr Index

Posted by Mohamed Parvez <pa...@gmail.com>.
I have looked in the that thread earlier. But there is no option there for a
solution from Solr side.

I mean the two more options there are
1] Use database triggers instead of DIH to manage updating the index :-
This out of question as we cant run 1000 odd triggers every hour to delete.

2] Some sort of ORM use its interception:-
This is also out of question as the deletes happens form external system or
directly on the database, not through our application.


To Say in Short, Solr Should have something thing to keep the index synced
with the database. As of now its one way street, updates rows, on DB will go
to the index. Deleted rows in the DB, will not be deleted from the Index


---
Thanks



On Mon, Dec 28, 2009 at 3:23 PM, Mauricio Scheffer <
mauricioscheffer@gmail.com> wrote:

> Here's a couple more options:
>
>
> http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/
>
> <
> http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/
> >
> Cheers,
> Mauricio
>
> On Mon, Dec 28, 2009 at 5:51 PM, Mohamed Parvez <pa...@gmail.com> wrote:
>
> > I am using Solr 1.4 and DIH to build the index from a table.
> >
> > I use full import once to create the index and then i keep using delta
> > import to update the index.
> >
> > All woks fine as long a the table gets added with only new rows.
> >
> > if there are some rows in the table that get deleted then the index does
> > not
> > get updated and has stale data.
> >
> > I have looked into the below options:
> >
> > 1] deletedPkQuery and deltaImportQuery :-
> > Both are of no use, as there is no way any DB SQL query can return the
> row
> > from the table, if the row is already deleted from the table.
> >
> > 2] preImportDeleteQuery and postImportDeleteQuery :-
> > Both are of no use, as they work only with full import. ( Also there is
> no
> > way any DB SQL query can return the row from the table, if the row
> > is already deleted from the table)
> >
> > 3] Creating a new field in the table, named delete, and setting it as
> true
> > and instead of actually deleting the row from the table:-
> > This is not possible as that table is a sort of getting updated from
> > external content management system, with news updates and has 1000 of new
> > rows added and 1000 deleted every hour. Cant keep the rows in the table
> > when
> > when the content has expired.
> >
> > Am I missing something?  How to keep the Index updated when the rows are
> > deleted from the table.
> >
> >
> > ------
> >
>

Re: Remove the deleted docs from the Solr Index

Posted by Mauricio Scheffer <ma...@gmail.com>.
Here's a couple more options:

http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/

<http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/>
Cheers,
Mauricio

On Mon, Dec 28, 2009 at 5:51 PM, Mohamed Parvez <pa...@gmail.com> wrote:

> I am using Solr 1.4 and DIH to build the index from a table.
>
> I use full import once to create the index and then i keep using delta
> import to update the index.
>
> All woks fine as long a the table gets added with only new rows.
>
> if there are some rows in the table that get deleted then the index does
> not
> get updated and has stale data.
>
> I have looked into the below options:
>
> 1] deletedPkQuery and deltaImportQuery :-
> Both are of no use, as there is no way any DB SQL query can return the row
> from the table, if the row is already deleted from the table.
>
> 2] preImportDeleteQuery and postImportDeleteQuery :-
> Both are of no use, as they work only with full import. ( Also there is no
> way any DB SQL query can return the row from the table, if the row
> is already deleted from the table)
>
> 3] Creating a new field in the table, named delete, and setting it as true
> and instead of actually deleting the row from the table:-
> This is not possible as that table is a sort of getting updated from
> external content management system, with news updates and has 1000 of new
> rows added and 1000 deleted every hour. Cant keep the rows in the table
> when
> when the content has expired.
>
> Am I missing something?  How to keep the Index updated when the rows are
> deleted from the table.
>
>
> ------
>