You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Welty, Richard" <rw...@ltionline.com> on 2012/04/18 16:48:13 UTC

pushing updates to solr from postgresql

i have a setup right this instant where the dataimporthandler is being used to pull data for an index from a postgresql server.

i'd like to switch over to push, and am looking for some validation of my approach.

i have perl installed as an untrusted language on my postgresql server and am planning to set up triggers on the tables where insert/update/delete operations should cause an update of the relevant solr indexes. the trigger functions will build xml in the format for UpdateXmlMessages and notify Solr via http requests.


is this sensible, or am i missing something easier?

also, does anyone have any thoughts about coordinating initial indexing/full reindexing via dataimporthandler with the trigger based push operations?

thanks,
   richard

Re: pushing updates to solr from postgresql

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Richard,

One thing to think about here is what you will do when Solr is unavailable to take a new document for whatever reason.  If you send docs to Solr from PG, docs either get indexed or not.  So you may have to catch errors and then mark documents in PG as not indexed.  You may want to keep track of initial and/or last index attempt and the total number of indexing attempts (new DB columns) and will probably want to use DIH to "pick up" unindexed documents from PG and get them indexed.

Also keep in mind that sending docs to Solr one by one will not be as efficient as sending batches of them or as efficient as getting a batch of them via DIH.  If your data volume is low this likely won't be a problem, but if it is it high or is growing, you'll want to keep this in mind.

Otis
----
Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html



>________________________________
> From: "Welty, Richard" <rw...@ltionline.com>
>To: solr-user@lucene.apache.org 
>Sent: Wednesday, April 18, 2012 10:48 AM
>Subject: pushing updates to solr from postgresql
> 
>i have a setup right this instant where the dataimporthandler is being used to pull data for an index from a postgresql server.
>
>i'd like to switch over to push, and am looking for some validation of my approach.
>
>i have perl installed as an untrusted language on my postgresql server and am planning to set up triggers on the tables where insert/update/delete operations should cause an update of the relevant solr indexes. the trigger functions will build xml in the format for UpdateXmlMessages and notify Solr via http requests.
>
>
>is this sensible, or am i missing something easier?
>
>also, does anyone have any thoughts about coordinating initial indexing/full reindexing via dataimporthandler with the trigger based push operations?
>
>thanks,
>   richard
>
>
>