You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mike O'Leary <tm...@uw.edu> on 2012/02/23 03:05:58 UTC

Is there a way to write a DataImportHandler deltaQuery that compares contents still to be imported to contents in the index?

I am working on indexing the contents of a database that I don't have permission to alter. In particular, the DataImportHandler examples that show how to specify a deltaQuery attribute value show database tables that have a last_modified column, and it compares these values with last_index_time values stored in the dataimport.properties file. The tables in the database I am working with don't have anything like a last_modified column. An indexing job I was running yesterday failed, and I would like to restart it so that it only imports the data that it hasn't already indexed. As a one-off, I could create a list of the keys of the database records that have been indexed and hack in something that reads that list as part of how it figures out what to index, but I was wondering if there is something built in that would allow me to do the same kind of comparison in a likely far more elegant way. What kinds of information do the deltaQuery attributes have access to, apart from the database tables, columns, etc., and do they have access to any information that would help me with what I want to do?
Thanks,
Mike

P.S. While we're on the subject of delta... attributes, can someone explain to me what the difference is between the deltaQuery and the deltaImportQuery attributes?