You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jérôme Droz <je...@transmediaco.com> on 2011/03/03 16:45:23 UTC

deletedPKQuery does not perform with compound PK

Hello,

I'm using a DIH to import documents from a database. Documents in the 
index represent a relationship between two entities, units and 
dealpoints ("unit has dealpoint"); thus document keys in the index refer 
to a compound SQL key. Full import works fine. In order to optimize the 
import process, I configured both the database and DIH configuration 
file for delta-import.

I added 3 more tables, updated by triggers: a table tracking 
modification time of units, another one tracking modification time of 
dealpoints, and the last one used to track deleted "units having a 
dealpoint".

The uniqueKey field of the schema is defined as follows:

<field name="id" type="string" indexed="true" stored="true" 
required="true" multiValued="false" />
...
<uniqueKey>id</uniqueKey>

Keys are generated by concatenating the unit id and the dealpoint id, 
separated by '-', in the SQL query.

Below is a sample of the data-config.xml I'm using (the original one is 
quite huge and may be confusing):

<dataConfig>
<dataSource    driver="com.mysql.jdbc.Driver"
                 url="jdbc:mysql://somehost:3306/somedatabase"
                 user="user"
                 password="**********" />
<document name="unitdealpoints">
<entity name="unitdealpoint" pk="unit_id,dealpoint_id"
             query="select concat_ws('-', cast(u.unit_id as char), 
cast(dp.deal_point_id as char)) as id, ...
                     from unit u, deal_point dp, ... where ..."
             deltaQuery="select us.unit_id as unit_id, dps.deal_point_id 
as dealpoint_id
                     from unit_state us, deal_point_state dps where 
us.unit_state_last_mod &gt; '${dataimporter.last_index_time}' or 
dps.deal_point_state_last_mod &gt; '${dataimporter.last_index_time}'"
             deltaImportQuery="select concat_ws('-', cast(u.unit_id as 
char), cast(dp.deal_point_id as char)) as id, ...
                     from unit u, deal_point dp, ... where (u.unit_id = 
'${dataimporter.delta.unit_id}' or dp.deal_point_id = 
'${dataimporter.delta.dealpoint_id}') and ..."
             deletedPKQuery="select id from unit_deal_point_delete">
             ...
</entity>
</document>
</dataConfig>

I specifically choose to track deleted entities in a dedicated 
(unit_deal_point_delete) table in order to prevent the known (and 
apparently unsolved) bugs described here:
https://issues.apache.org/jira/browse/SOLR-1229?focusedCommentId=12722427&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12722427

The id field in the unit_deal_point_delete table has the exact same 
representation as the document keys. Below is an example of a trigger:

create trigger unit_delete_before before delete on unit
for each row
begin
     insert ignore into unit_deal_point_delete (id) select 
concat_ws('-', cast(old.unit_id as char), cast(dpu.deal_point_id as 
char)) from deal_point_unit dpu where dpu.unit_id = old.unit_id;
end;

Delta and delta-import queries works fine, but the deletedPKQuery seems 
to always return 0 rows, although the unit_deal_point_delete table is 
obviously not empty. No errors written in the logs, but:

Mar 3, 2011 11:23:49 AM org.apache.solr.handler.dataimport.DocBuilder 
collectDelta
INFO: Completed DeletedRowKey for Entity: unitdealpoints rows obtained : 0

I have tested it with versions 1.4.0 & 1.4.1 and the result is the same: 
documents are not deleted.

What is the problem? Am I missing something?

Kind regards
--
Jerome Droz