You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Erik Hatcher (JIRA)" <ji...@apache.org> on 2009/06/18 15:55:08 UTC

[jira] Created: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
-------------------------------------------------------------------------------------------

                 Key: SOLR-1229
                 URL: https://issues.apache.org/jira/browse/SOLR-1229
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler
    Affects Versions: 1.4
            Reporter: Erik Hatcher


Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.

Here's a config I'm using against a mocked test database:

<dataConfig>
 <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
 <document name="tests">
   <entity name="test"
           pk="board_id"
           transformer="TemplateTransformer"
           deletedPkQuery="select board_id from boards where deleted = 'Y'"
           query="select * from boards where deleted = 'N'"
           deltaImportQuery="select * from boards where deleted = 'N'"
           deltaQuery="select * from boards where deleted = 'N'"
           preImportDeleteQuery="datasource:board">
     <field column="id" template="board-${test.board_id}"/>
     <field column="datasource" template="board"/>
     <field column="title" />
   </entity>
 </document>
</dataConfig>

Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.

I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722364#action_12722364 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

In my latest patch, I added a comment with a question about the use of the loop in deletelAll.  I don't see why it is needed.  Noble?

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lance Norskog updated SOLR-1229:
--------------------------------

    Attachment: tests.patch

This patch contains two unit test files addressing the topics of this bug: 
1) database and Solr have different primary key names
2) solr primary key value is processed with a template

TestSqlEntityProcessorDelta.java does general tests of delta-import

TestSqlEntityProcessorDelta2.java does the same tests with a different solr id name and a template for the solr id value.

Also, there was a mistake in the test fixture which caused delta imports to always empty the index first.

I believe that these tests should work. Please make all changes necessary so that we have working unit tests for these features, and please change the wiki to match all syntax changes.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722128#action_12722128 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

FYI - some prior discussion on solr-user too: http://www.lucidimagination.com/search/document/b8ef8e2e9e1de005/pk_vs_uniquekey_with_dih_delta_import, about trying "select concat('board-',board_id)  
from boards where deleted = 'Y'", to no avail.




> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725926#action_12725926 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

The testcases are wrong.

take the case of testCompositePk_DeltaImport_empty()

the uniqueKey is 'solr_id' and there is no such field which maps to 'solr_id '

see the mapping
{code:xml}
<field column="id" name="id"/>
{code}




> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722354#action_12722354 ] 

Noble Paul edited comment on SOLR-1229 at 6/21/09 6:09 AM:
-----------------------------------------------------------

bq.But in order to call writer.deleteDoc(key) in DocBuilder#deleteAll it must use the value of the uniqueKey field, not of the pk one


Why can't you keep the uniqueKey same as pk  as shown below(it is not used anywhere else (yes there is one other place which is gonna be deprecated) ) . That should solve the problem. DIH is agnostic of the solr primary key . The point is that , the key name does not matter only the value matters . As long as the value is correct, delete should work automatically . 

the following should be just fine
{code:xml}
<entity name="test"
pk="id"
...
{code}



      was (Author: noble.paul):
    bq.But in order to call writer.deleteDoc(key) in DocBuilder#deleteAll it must use the value of the uniqueKey field, not of the pk one


Why can't you keep the uniqueKey same as pk  (it is not used anywhere else (yes there is one other place which is gonna be deprecated) ) . That should solve the problem. DIH is agnostic of the solr primary key . The point is that , the key name does not matter only the value matters . As long as the value is correct, delete should work automatically . 


  
> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721349#action_12721349 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

there is a huge problem w/ the current implementation.
The whole delta-import process is built like an after thought. I wish to revamp the whole design. so that all the rows returned, deletedPkQuery or deltaQuery etc should also go through the transformations

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732213#action_12732213 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

Something is documented on the wiki but not used: multiple PKs in one entity.

On the wiki page, see the config file after "Writing a huge deltaQuery" - there is a attribute: 
{pk="ITEM_ID, CATEGORY_ID"}
There is code to parse this in DataImporter.InitEntity() and store the list in Entity.primaryKeys. But this list of PKs is never used.

I think the use case for this is that the user requires more fields besides the uniqueKey for a document. Is this right? This is definitely on my list of must-have features. The second field may or may not be declared "required" in the schema, so looking at the schema is not good enough. The field has to be declared "required" in the dataconfig.

Lance

(I split this comment out from the previous since they are not related. However, this is another feature inside the same bit of code upon which we are endlessly chewing.)

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-1229:
-----------------------------

    Attachment: SOLR-1229.patch

Erik plz let me know if this helps.



> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>         Attachments: SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722359#action_12722359 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

If the pk attribute is not used, then let's just remove it altogether.  DIH should NOT be agnostic to Solr's uniqueKey field though - and it would be silly to mandate users duplicate the uniqueKey setting as DIH pk values when it can just be gotten from the schema.  So -1 to misusing a pk-named attribute this way, when the most intuitive interpretation would be that pk is the Primary Key of the table being indexed.



> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722922#action_12722922 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

I plan on committing this patch later this week.  It fixes the problems we've had.   Please review and comment further if there are still holes... but we really should be discussing this with unit tests.  I hope to see some of those this week from Lance.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740433#action_12740433 ] 

Noble Paul edited comment on SOLR-1229 at 8/7/09 12:18 AM:
-----------------------------------------------------------

The issue is reopened because it breaks an existing functionality. The latest patch fixes that

      was (Author: noble.paul):
    The issue is not reopened because it breaks an existing functionality. The latest patch fixes that
  
> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724471#action_12724471 ] 

Shalin Shekhar Mangar commented on SOLR-1229:
---------------------------------------------

Erik, the most common use-case as far as I have seen is that the primary key in tables is different from the uniqueKey in Solr (think about multiple tables with each having a root-entity). Yes, the pk can be transformed (or one can alias it in sql) but this being the most common use-case, I feel pk should be kept as-is.

Let me give a few possible cases 
# The name of table's primary key is different from solr's unique key name and the deletedPkQuery returns only one column (most common use-case)
# The name of table's primary key is different from solr's unique key name and the deletedPkQuery returns multiple columns
# The name of table's primary key is same as solr's unique key name and the deletedPkQuery returns only one column
# The name of table's primary key is same as solr's unique key name and the deletedPkQuery returns multiple columns

For #1 'pk' does not matter because we can use the single columns coming back from deletedPkQuery
For #2, 'pk' is required otherwise the user is forced to use a transformer (or alias). For non-database use-cases (there is none right now), there is no aliasing so the user must write a transformer
For #3, neither 'pk' nor 'uniqueKey' matters
For #4, we can use solr's uniqueKey name (I guess this is your use-case?). I think that this is a rare use-case.

If at all, we decide to go with uniqueKey only, the right way to do that would be to use the corresponding column-mapping for looking up the unique key. For the example below, we should use "db-id" to lookup in the map returned by deletedPkQuery if solr-id is the uniqueKey in solr:
{code:xml}
<field column="db-id" name="solr-id" />
{code}

However, even though the above approach is the 'right' one, it is very tricky and hard to explain to users. Also, there could be multiple columns mapped to same solr key (think about template for unique key for 'types' of documents based on a flag column). This may be very error-prone.

What do you think?

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721264#action_12721264 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

Maybe the pk field returned from deletedPkQuery should be run through the transformation process on the uniqueKey (id in this case) field to get the actual value??

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723393#action_12723393 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

The pk attribute currently is used to track only modified/deleted records during a delta-import.  But only the uniqueKey setting makes sense, and that will get mapped from the transformations applied.  I fail to see how removing pk (and of course refactoring its current uses to the uniqueKey value) will cause problems.  The only thing that matters here is that transformations are applied such that the uniqueKey field value is accurate.  There is nothing in what I'm suggesting here that makes the DB primary key field match the Solr uniqueKey field name.  

Again, tests are really required so we can be sure we have the bases covered, but it's been tested pretty thoroughly that my patch fixes a serious issue.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul reassigned SOLR-1229:
--------------------------------

    Assignee: Noble Paul

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher updated SOLR-1229:
-------------------------------

    Attachment: SOLR-1229.patch

Noble - that looks like the right thing to add for delta queries, but that didn't help with the deleted one.  I've attached a patch that fixes things in my limited test.  

We really need to add some unit tests for this - tricky business though.  Lance - do you have some unit tests to add that shows it broken and then fixed with this patch?

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul resolved SOLR-1229.
------------------------------

    Resolution: Fixed

committed :r807537

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732433#action_12732433 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

bq.1) that this column must exist for a document to be generated, and 
no,
2) that this entity is the level where documents are created. Is this true?
no: rootEntity decides that

pk is currently used for delete. XPathEntityProcessor also uses that to decode if this row is a candidate for a doc

multiple pk values were supported for deltas (to construct the deltaImportQuery when it is not present  ). That is going to be deprecated from 1.4. 

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher updated SOLR-1229:
-------------------------------

    Attachment: SOLR-1229.patch

Here's another patch that removes references to pk from DocBuilder#deleteAll.  

I've just looked for uses of pk myself, and indeed it seems it should be removed.  DIH must know and use the uniqueKey field from the schema for updates/deletes - there is no reason for DIH configs to have to specify it again.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732213#action_12732213 ] 

Lance Norskog edited comment on SOLR-1229 at 7/16/09 3:38 PM:
--------------------------------------------------------------

Something is documented on the wiki but not used: multiple PKs in one entity.

On the wiki page, see the config file after "Writing a huge deltaQuery" - there is a attribute: 
{pk="ITEM_ID, CATEGORY_ID"}
There is code to parse this in DataImporter.InitEntity() and store the list in Entity.primaryKeys. But this list of PKs is never used.

I think the use case for this is that the user requires more fields besides the uniqueKey for a document. Is this right? This is definitely on my list of must-have features. The second field may or may not be declared "required" in the schema, so looking at the schema is not good enough. The field has to be declared "required" in the dataconfig.

Lance

(I split this comment out from the previous since they are not related. However, this is another feature inside the same bit of code upon which we are ceaselessly chewing.)

      was (Author: lancenorskog):
    Something is documented on the wiki but not used: multiple PKs in one entity.

On the wiki page, see the config file after "Writing a huge deltaQuery" - there is a attribute: 
{pk="ITEM_ID, CATEGORY_ID"}
There is code to parse this in DataImporter.InitEntity() and store the list in Entity.primaryKeys. But this list of PKs is never used.

I think the use case for this is that the user requires more fields besides the uniqueKey for a document. Is this right? This is definitely on my list of must-have features. The second field may or may not be declared "required" in the schema, so looking at the schema is not good enough. The field has to be declared "required" in the dataconfig.

Lance

(I split this comment out from the previous since they are not related. However, this is another feature inside the same bit of code upon which we are endlessly chewing.)
  
> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher updated SOLR-1229:
-------------------------------

    Description: 
Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.

Here's a config I'm using against a mocked test database:

{code:xml}
<dataConfig>
 <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
 <document name="tests">
   <entity name="test"
           pk="board_id"
           transformer="TemplateTransformer"
           deletedPkQuery="select board_id from boards where deleted = 'Y'"
           query="select * from boards where deleted = 'N'"
           deltaImportQuery="select * from boards where deleted = 'N'"
           deltaQuery="select * from boards where deleted = 'N'"
           preImportDeleteQuery="datasource:board">
     <field column="id" template="board-${test.board_id}"/>
     <field column="datasource" template="board"/>
     <field column="title" />
   </entity>
 </document>
</dataConfig>
{code}

Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.

I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

  was:
Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.

Here's a config I'm using against a mocked test database:

<dataConfig>
 <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
 <document name="tests">
   <entity name="test"
           pk="board_id"
           transformer="TemplateTransformer"
           deletedPkQuery="select board_id from boards where deleted = 'Y'"
           query="select * from boards where deleted = 'N'"
           deltaImportQuery="select * from boards where deleted = 'N'"
           deltaQuery="select * from boards where deleted = 'N'"
           preImportDeleteQuery="datasource:board">
     <field column="id" template="board-${test.board_id}"/>
     <field column="datasource" template="board"/>
     <field column="title" />
   </entity>
 </document>
</dataConfig>

Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.

I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.


> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725126#action_12725126 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

bq. Also, I recommend that a feature be removed: creating a deltaImportQuery if the user does not declare one. From the wiki: 

It will be removed. But, we cannot remove it from the wiki until we release 1.4. 

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724780#action_12724780 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

In all of those cases, as long as what is returned from the query and run through transformations ends up with a key in the map that is the same as the uniqueKey setting in schema.xml, then all is fine.  I still don't see a need to set pk="solr-id".  Won't there always be a uniqueKey-named key in that map after transformations are applied?  uniqueKey definitely matters... that's the field that must be used for deletions, and what I'm consistently seeing mentioned here is that you want to duplicate that by saying pk="<uniqueKeyFieldName>", which is unnecessary duplication.  When would you set pk to anything else?

It'll be later next week at the earliest, but I hope to get some unit tests contributed so we can discuss this topic through tests rather than prose.

My use case is exactly the config at the top of this issue, where the uniqueKey value is a templated transformation (because of multiple DIH configurations bringing in various data sources, so the  unique key value must be fabricated to be guaranteed to be unique across different datasources that may have the same primary keys in the databases) - this corresponds to your #1.


> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724990#action_12724990 ] 

Erik Hatcher edited comment on SOLR-1229 at 6/28/09 1:23 PM:
-------------------------------------------------------------

Lance - the SQL query could return more than one column.   A bit non-sensical, but it could be specified that way.  SELECT * FROM BOARDS WHERE deleted='Y' for example.

And no, it isn't supposed to return Solr uniqueKey id's!!!   That's the main crux of the dilemma here... how to jive what is returned from a SQL resultset with Solr unique keys.

      was (Author: ehatcher):
    Lance - the SQL query could return more than one column.   A bit non-sensical, but it could be specified that way.  SELECT * FROM BOARDS WHERE deleted='Y' for example.
  
> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Jun 24, 2009 at 4:05 PM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

>
> But... since transformations are applied, solr_id will be in that map too,
> which is the uniqueKey value.  Problem solved!  :)
>

Erik, the most common use-case is the name of the table's primary key being
different from Solr's uniqueKey. We should not force the user to write a
Transformer to convert between them. This was the motivation behind keeping
the pk attribute.

I don't think pk should be removed at all.

-- 
Regards,
Shalin Shekhar Mangar.

Re: [jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jun 24, 2009, at 12:28 AM, Noble Paul (JIRA) wrote:
> let me explain with the above example .
>
> {code:java}
> // this code is  taken from your patch
> SchemaField uniqueKeyField =  
> dataImporter.getSchema().getUniqueKeyField();
> // now the value of uniqueKeyField is 'solr_id'
> if (uniqueKeyField == null) return;
> Object key = map.get(uniqueKeyField.getName());
> //the map contains {"db_id"-> "12345"}
> //so key == null; and it will do nothing and return
> // on the contrary if you set the pk="db_id" , then it works

But... since transformations are applied, solr_id will be in that map  
too, which is the uniqueKey value.  Problem solved!  :)

	Erik


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723427#action_12723427 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

hi Erik it works in your case , But the usecase I have given above (which is a more normal usecase)

let me explain with the above example . 

{code:java}
// this code is  taken from your patch
 SchemaField uniqueKeyField = dataImporter.getSchema().getUniqueKeyField();
// now the value of uniqueKeyField is 'solr_id'
 if (uniqueKeyField == null) return;
 Object key = map.get(uniqueKeyField.getName());
 //the map contains {"db_id"-> "12345"}
 //so key == null; and it will do nothing and return
 // on the contrary if you set the pk="db_id" , then it works





{code}

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722495#action_12722495 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

Yup, I've deleted my patch. After a lot more examination of the code, and combinatoric testing, I see why it works and is the right fix for this incarnation of the DIH.

And, yes, unit tests for more of these options are on the way!

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739424#action_12739424 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

Do you have a failing test case?

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739444#action_12739444 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

Nope. The last current trunk  has changed an existing feature. The pk  attribute is ignored. that is why I have reopened it in the first place

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725138#action_12725138 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

bq. I believe that these tests should work.

hi Lance, do the testcases run.? or we require some fix to make it work? I applied the patch and the testcases are failing

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lance Norskog updated SOLR-1229:
--------------------------------

    Attachment: SOLR-1229.patch

There are two bugs in this issue:

1) The deletedPkQuery must say "select field from table". It may not say "select expression" or "select field AS FIELD from table". The former does not work and the latter may work with some databases and not others.  In other words, the first example works and the others fail:

    deletedPkQuery="select id from item where EXPRESSION"
    deletedPkQuery="select concat('',id) from item where EXPRESSION"
    deletedPkQuery="select id from item AS ID where EXPRESSION"
    deletedPkQuery="select id from item AS ITEM.ID where EXPRESSION"

2) If a template is used to format the contents of the Solr primary key, that template is declared at the <field> level and is not visible in the parent <entity> level. Since the deletedPkQuery is declared in the <entity> level, it cannot see the field's template. Thus the template cannot be applied to the results of the deletedPkQuery. Any formatting in the <field> template must be duplicated (in SQL syntax) in the deletedPkQuery. 

This is workaround that achieves the goal of #2: if the Solr primary key is declared as

    <field name="id" template="prefix-${item.id}"

then the deletedPkQuery must be declared as

    deletedPkQuery="select concat('prefix',id) from item where EXPRESSION"

Because of bug #1, it is not possible to use this workaround.

The attached patch fixes bug #1, making it possible to use the workaround. It is Erik's second patch but only with the fix for bug #1. It does not create an automatic way for results of deletedPkQuery to be formatted with the primary key's template.



> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738816#action_12738816 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

I guess the latest patch can be committed and this issue can be closed

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722956#action_12722956 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

bq.he pk attribute seems like it should simply be removed 

The pk attribute IS REQUIRED  for this use case. I guess the solution to your problem is that you set the pk correct as the name you find in the "source row" map. if we remove it every user will be forced to keep the uniquekey same as the name it has in the source DB.



> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722354#action_12722354 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

bq.But in order to call writer.deleteDoc(key) in DocBuilder#deleteAll it must use the value of the uniqueKey field, not of the pk one


Why can't you keep the uniqueKey same as pk  (it is not used anywhere else (yes there is one other place which is gonna be deprecated) ) . That should solve the problem. DIH is agnostic of the solr primary key . The point is that , the key name does not matter only the value matters . As long as the value is correct, delete should work automatically . 



> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722956#action_12722956 ] 

Noble Paul edited comment on SOLR-1229 at 6/22/09 9:45 PM:
-----------------------------------------------------------

bq.he pk attribute seems like it should simply be removed 

The pk attribute IS REQUIRED  for this use case. I guess the solution to your problem is that you set the pk correct as the name you find in the "source row" map. if we remove it every user will be forced to keep the uniquekey same as the name it has in the source DB.

Take the following usecase

{code:xml}
   <entity name="test"
           pk="db_id"
           deletedPkQuery="select db_id from boards where deleted = 'Y'">
     <field column="db_id" name="solr_id"/>
   </entity>
{code}

elsewhere in solrconfig.xml

{code:xml}
<uniqueKey>solr_id</uniqueKey>
{code}



if you omit that pk attributethis will fail

      was (Author: noble.paul):
    bq.he pk attribute seems like it should simply be removed 

The pk attribute IS REQUIRED  for this use case. I guess the solution to your problem is that you set the pk correct as the name you find in the "source row" map. if we remove it every user will be forced to keep the uniquekey same as the name it has in the source DB.


  
> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher resolved SOLR-1229.
--------------------------------

    Resolution: Fixed

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722473#action_12722473 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

bq.If the pk attribute is not used, then let's just remove it altogether.....

hi Erik. you got me wrong.   Let me try to explain. There are two potential names for a field one is the 'column' and other is the 'name'

'column' is the name of the field in the source or in DIH. The 'name' is the name of that field in Solr.  DIH uses the 'name' attribute only when it writes the field to Solr. 

The relation ship between 'pk' attribute and the '<uniqueKey>' in Solr  is same . The distinction is important . Otherwise the user will be forced to use the same name in both the db and solr (assuming no transformations are done). 


> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722333#action_12722333 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

Erik, the fix is not right

see this
{code}
key = map.get(dataImporter.getSchema().getUniqueKeyField().getName());
{code}

The name of the uniqueKey field in the scema and the one you have in the map does not have to be same. DIH really gives you an option for it to be different. The attribute 'pk' is only used for this purpose. 

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul reopened SOLR-1229:
------------------------------


I guess the fix was not done

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher reopened SOLR-1229:
--------------------------------


Noble - I don't really appreciate you committing a partial patch of this.  What you committed requires us to set pk="id", which is just silly and nonsensical.  I'm opening this back up to continue to fix this awkward area of DIH.  

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722258#action_12722258 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

Someone with magic powers, please mark this as "to be fixed in Solr 1.4".

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>         Attachments: SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739930#action_12739930 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------


There are a couple of other features to remove in this code:

1) multiple primary keys
2) deltaImportQuery is created automatically if it is not given in the dataconfig.xml file

Do we want to attack all of those in this issue?





> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739948#action_12739948 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

bq. Do we want to attack all of those in this issue?

we must remove them . Let us have a separate issue for them

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul reopened SOLR-1229:
------------------------------


this fix does not allow the user to override the pk key name from the data-config.xml . The deduced key takes precedence. In most cases the pk attribute is not required, but when it is mentioned it should take precedence

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher resolved SOLR-1229.
--------------------------------

    Resolution: Fixed
      Assignee: Erik Hatcher  (was: Noble Paul)

Committed to r792963.  Thanks Lance and Noble for iterating on this.  We've at least got it working well enough for our current needs.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746546#action_12746546 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

Noble, what is this broken functionality? 

I just did a full update and all unit tests work. If you describe the problem I will do the unit test and fix it.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730201#action_12730201 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

Noble-

With this change to DocBuilder.deleteAll(), the Delta2 test runs. With the code in the other order, the two delete operations do not work because the template is not applied. 

Please alter this config file (from unit test Delta2) to embody the use case you mention. The database pk is 'id' and the schema pk is 'solr_id'.  And please describe the primary key design of the database and solr schemas.

<dataConfig>
  <document>
     <entity name="x" pk="id" transformer="TemplateTransformer"          
         query="select * from x"  
         deletedPkQuery="select id from x where last_modified > NOW AND deleted='true'" 
         deltaImportQuery="select * from x where id='${dataimporter.delta.id}'"  
         deltaQuery="select id from x where last_modified > NOW">
            <field column="solr_id" template="prefix-${x.id}"/>
               <entity name="y" query="select * from y where y.A='${x.id}'">
                  <field column="desc" />
               </entity>
      </entity>
   </document>
</dataConfig>


> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722345#action_12722345 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

Noble - they do have to be the same, and that is why applying the transformations as you patched is part of the right answer.  But in order to call writer.deleteDoc(key) in DocBuilder#deleteAll it must use the value of the uniqueKey field, not of the pk one.  That is the crux of the issue here.  In my case, in the example at the top of this issue, the pk field is board_id with a value of "1", and the id field after applying transformations is "board-1".  The delete-by-id to Solr must be done using value "board-1" and that is only obtained by looking up the uniqueKey field (id in this example) from the schema and pulling it from the map.

The latest patch I supplied worked in my test case and I analyzed it through a debugger. 


> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lance Norskog updated SOLR-1229:
--------------------------------

    Attachment:     (was: SOLR-1229.patch)

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727914#action_12727914 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

Erik, did you have a chance to look at the fix?

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724987#action_12724987 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

Shalin, how can a deletedPkQuery return multiple columns? It is supposed to return a list of Solr uniqueKeys. These are then sent in with the Solr delete-by-id operation.



> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725810#action_12725810 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

I apologize, I did not put it clearly. 

Yes, these tests do not work against the current code base. I believe they are correct tests for the features we have talked about in this issue, SOLR-1229. The code in these tests should work, when this bug is resolved. They will not work with the partial patch currently committed.

These tests are variations of the original test set. They handle two cases: 1) the solr uniqueKey has a different name than the DB primary key, and 2) the solr uniqueKey's value is a transformation of the DB primary key's value.

In other words, if your code actually works, and passes your by-hand tests, it should then pass all of these tests. If there is something wrong in these tests, please fix it.

Should you wish to write further tests, the pre/post remove query features at present have no coverage.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724990#action_12724990 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

Lance - the SQL query could return more than one column.   A bit non-sensical, but it could be specified that way.  SELECT * FROM BOARDS WHERE deleted='Y' for example.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-1229:
-----------------------------

    Attachment: SOLR-1229.patch

Erik, the new fix

This eliminates the need for 'pk' most of the cases.  It tries to identify the corresponding column mapping for the solr uniqueKey.  But the user can override this by specifying the pk attribute if the guess is not going to be correct

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722443#action_12722443 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

I've retested my last patch and all works fine in my single record simple test case using the exact configuration (and MySQL DB) as posted here.  I do a full-import, one record goes in.  I do change deleted to 'Y' and do a delta-import and the document is removed from Solr.

Lance - please fall back to my last patch - the applyTransformer call in EntityProcessorWrapper#nextDeletedRow that I added is crucial.  It gives an id=board-1 value which is then used for deletion in DocBuilder#deleteAll.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747175#action_12747175 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

The latest patch works against the latest SVN checkout. Please commit it and close this issue.

Thanks!

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746584#action_12746584 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

bq.what is this broken functionality?

Till this fix the user provided 'pk' was always honoured. Now, the derived pk can never be overridden. My latest patch has the fix and corresponding changes to the tests. 

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher updated SOLR-1229:
-------------------------------

    Fix Version/s: 1.4

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725629#action_12725629 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

Lance - I too get failures with your tests.  Here's one example:

junit.framework.AssertionFailedError: query failed XPath: //*[@numFound='0'] xml response was: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int><lst name="params"><str name="rows">20</str><str name="start">0</str><str name="q">desc:hello OR XtestCompositePk_DeltaImport_replace_nodelete</str><str name="qt">standard</str><str name="version">2.2</str></lst></lst><result name="response" numFound="1" start="0"><doc><arr name="desc"><str>hello</str></arr><str name="id">1</str><date name="timestamp">2009-06-30T13:55:58.2Z</date></doc></result>
</response>

	at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:182)
	at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:172)
	at org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta.testCompositePk_DeltaImport_replace_nodelete(TestSqlEntityProcessorDelta.java:194)




> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-1229:
-----------------------------

    Attachment: SOLR-1229.patch

ideally , for your usecases , the pk attribute is not required. So i have removed it. Now it uses the user provided pk if it is not present it falls back to the solr schema uniqueKey

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740433#action_12740433 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

The issue is not reopened because it breaks an existing functionality. The latest patch fixes that

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724803#action_12724803 ] 

Noble Paul commented on SOLR-1229:
----------------------------------

bq.don't really appreciate you committing a partial patch of this

I am sorry about this. But, my fix was good enough for the bug's description. I was planing to open another issue for this.

Either we should change the descrption to , deprecate 'pk' or we can open another issue for the same.
DIH works as intented to with this fix and the deprecation can be take up .

bq.Won't there always be a uniqueKey-named key in that map after transformations..

The assumption is that transformation is always done. My experience with ''DIH support says that users don't use it always.Transformer is just a addon .

Anyway , the right fix would be to find out what is the right mapping in DIH for the given solr <uniqueKey> and use it as a pk. That can definitely be another jira issue

your thoughts?

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724925#action_12724925 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

Also, I recommend that a feature be removed: creating a deltaImportQuery if the user does not declare one. From the wiki:

"deltaImportQuery : (Only used in delta-import) . If this is not present , DIH tries to construct the import query by(after identifying the delta) modifying the 'query' (this is error prone). There is a namespace ${dataimporter.delta.<column-name>} which can be used in this query. e.g: select * from tbl where id=${dataimporter.delta.id}   Solr1.4."

The user should just be required to supply the deltaImportQuery. Now that the deltaImportQuery declaration exists, this "helper feature" is just a trap for the unwary. The phrase "this is error prone" is always a bad sign.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722441#action_12722441 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

Lance - your patch misses the all important EntityProcessorWrapper.java changes that both Noble and I contributed in our patches.  That's needed to make this work!

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731780#action_12731780 ] 

Lance Norskog edited comment on SOLR-1229 at 7/16/09 3:35 PM:
--------------------------------------------------------------

Ok - these run. Thanks.

Just to make sure I understand. The 'pk' attribute declares 2 things:  
    1) that this column must exist for a document to be generated, and 
    2) that this entity is the level where documents are created. Is this true?

tmpid appears as an unused name merely so that ${x.id} is sent into solr_id. Maybe name="" would be more clear for this purpose?

Lance





      was (Author: lancenorskog):
    Ok - these run. Thanks.

Just to make sure I understand. The 'pk' attribute declares 2 things:  
    1) that this column must exist for a document to be generated, and 
    2) that this entity is the level where documents are created. Is this true?

tmpid appears as an unused name merely so that ${x.id} is sent into solr_id. Maybe name="" would be more clear for this purpose?

Something is documented on the wiki but not used: multiple PKs in one entity.

On the wiki page, see the config file after "Writing a huge deltaQuery" - there is a attribute: 
{{{pk="ITEM_ID, CATEGORY_ID"}}}
There is code to parse this in DataImporter.InitEntity() and store the list in Entity.primaryKeys. But the list of PKs is never used. 

I think the use case for this is that the user requires more fields besides the uniqueKey for a document.  Is this right? This is definitely on my list of must-have features. The second field may or may not be declared "required" in the schema, so looking at the schema is not good enough. The field has to be declared "required" in the dataconfig.

Lance




  
> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722921#action_12722921 ] 

Erik Hatcher commented on SOLR-1229:
------------------------------------

Noble - I'm not following your explanation here.  The pk attribute seems like it should simply be removed (and the usages of it refactored to use Solr's uniqueKey setting when doing updated/deleted activities - that's the key to the Solr docs after all).

Regarding 'column'/'name' - there are cases where it is quite confusing - where column is used to specify the Solr field name - and that feels wrong to me.  'name' should always be the Solr name, and if it is a templated field, it should be 'name'/'template' not 'column'/'template', for example.   



> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740385#action_12740385 ] 

Lance Norskog commented on SOLR-1229:
-------------------------------------

The Delta2 test handles both of the problems in this issue. Should this issue be closed?


> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul resolved SOLR-1229.
------------------------------

    Resolution: Fixed

committed r788587

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher updated SOLR-1229:
-------------------------------

    Attachment: SOLR-1229.patch

Latest patch (adapted by Lance).  All tests pass and this will be committed shortly.

> deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1229
>                 URL: https://issues.apache.org/jira/browse/SOLR-1229
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Erik Hatcher
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a template board-<PK>.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In our definition, unique key of Solr document is the primary key of the top level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.