You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kristian Rink <ka...@gmail.com> on 2013/05/28 08:31:56 UTC

delta-import tweaking?

Folks;

playing with Solr and an existing (legacy) RDBMS structure which we
can't change much, I am trying to figure out how to best make Solrs
full/delta import work for me. A few thoughts:

(a) The usual tutorials outline something like 

WHERE LASTMODIFIED > '${dih.last_index_time}

in order to select what needs to be re-imported. So far I didn't manage
to figure out which other informations the DataImportHandler does
expose. In my case, something like

WHERE ID > '${dih.last_indexed_id}

definitely would help as objects are just added to that very structure
and the ID is an auto-increment; however I couldn't find whether
DataImportHandler supports such an approach.

(b) I see that "last_index_time" returns a particularly fixed format.
In our database, with a modestly more complex SELECT, we also could
figure out which entities have been changed using some protocol table
which includes timestamps in seconds since EPOCH. Is there some way of
retrieving such a timestamp from DataImportHandler or will I have to do
so somehow on my own? 

Thanks in advance and all the best,
Kristian

Re: delta-import tweaking?

Posted by Kristian Rink <ka...@gmail.com>.
Hi Shawn;

and first off, thanks bunches for your pointers.

Am Tue, 28 May 2013 09:31:54 -0600
schrieb Shawn Heisey <so...@elyograg.org>:
> My workaround was to store the highest indexed autoincrement value in
> a location outside Solr.  In my original Perl code, I dropped it into
> a file on NFS.  The latest iteration of my indexing code (Java, using 
> SolrJ) no longer uses DIH for regular indexing, but it still uses
> that stored autoincrement value, this time in another database
> table.  I do still use full-import for complete index rebuilds.

Well, overally after playing with it a bit last nite, I decided to also
go down the SolrJ way; we'll be likely to use this in the future anyway
as the rest of our environment's Java too, so going for it right now
seems just the logical thing to do.

Thanks and all the best! 
Kristian 

Re: delta-import tweaking?

Posted by Shawn Heisey <so...@elyograg.org>.
On 5/28/2013 12:31 AM, Kristian Rink wrote:
> (a) The usual tutorials outline something like
>
> WHERE LASTMODIFIED > '${dih.last_index_time}

[snip]

> (b) I see that "last_index_time" returns a particularly fixed format.
> In our database, with a modestly more complex SELECT, we also could
> figure out which entities have been changed using some protocol table
> which includes timestamps in seconds since EPOCH. Is there some way of
> retrieving such a timestamp from DataImportHandler or will I have to do
> so somehow on my own?

Your situation sounds like mine.  I found a workaround, but filed 
SOLR-1920 anyway to try and get support for tracking something besides 
the current time.  After nearly three years with no motion, and not 
being able to do it myself, I finally closed it:

https://issues.apache.org/jira/browse/SOLR-1920

My workaround was to store the highest indexed autoincrement value in a 
location outside Solr.  In my original Perl code, I dropped it into a 
file on NFS.  The latest iteration of my indexing code (Java, using 
SolrJ) no longer uses DIH for regular indexing, but it still uses that 
stored autoincrement value, this time in another database table.  I do 
still use full-import for complete index rebuilds.

You can pass arbitrary parameters into Solr via the dataimport URL.  If 
you pass in a variable called maxId, then you can access that in your 
DIH config with ${dih.request.maxId} and use it any way you like.

Thanks,
Shawn