You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2009/06/17 22:47:29 UTC

pk vs. uniqueKey with DIH delta-import

First - DIH has worked pretty well in a new customer engagement of  
ours.  We've easily imported tens of millions of records with no  
problem.  Kudos to the developers/contributors to DIH - it got us up  
and running quickly.  But now we're delving into more complexities and  
having some issues.

Now on to my current issue, doing a delta-import such that records  
marked as "deleted" in the database are removed from Solr using  
deletedPkQuery.

Here's a config I'm using against a mocked test database:

<dataConfig>
   <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql:// 
localhost/db"/>
   <document name="tests">
     <entity name="test"
             pk="board_id"
             transformer="TemplateTransformer"
             deletedPkQuery="select board_id from boards where deleted  
= 'Y'"
             query="select * from boards where deleted = 'N'"
             deltaImportQuery="select * from boards where deleted = 'N'"
             deltaQuery="select * from boards where deleted = 'N'"
             preImportDeleteQuery="datasource:board">
       <field column="id" template="board-${test.board_id}"/>
       <field column="datasource" template="board"/>
       <field column="title" />
     </entity>
   </document>
</dataConfig>

Note that the uniqueKey in Solr is the "id" field.  And its value is a  
template board-<PK>.

I noticed the javadoc comments in DocBuilder#collectDelta it says  
"Note: In our definition, unique key of Solr document is the primary  
key of the top level entity".  This of course isn't really an  
appropriate assumption.

I also tried a deletedPkQuery of "select concat('board-',board_id)  
from boards where deleted = 'Y'", but got an NPE (relevant stack trace  
below).

It seems that deletedPkQuery only works if the pk and Solr's uniqueKey  
field use the same value.  Is that the case?  If this is the case  
we'll need to fix this somehow.  Any suggestions?

Thanks,
	Erik

stack trace from scenario mentioned above:
SEVERE: Delta Import Failed
java.lang.NullPointerException
	at  
org 
.apache.solr.handler.dataimport.SolrWriter.deleteDoc(SolrWriter.java:83)
	at  
org 
.apache.solr.handler.dataimport.DocBuilder.deleteAll(DocBuilder.java: 
275)
	at  
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java: 
247)
	at  
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java: 
159)
	at  
org 
.apache 
.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java: 
337)


Re: pk vs. uniqueKey with DIH delta-import

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
a have raised an issue and fixed it
https://issues.apache.org/jira/browse/SOLR-1228

2009/6/18 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
> apparently the row return a null 'board_id'
>
> your stacktrace sugggests this. even if it is fixed I guess it may not
> work because your are storing the id as
>
>
> board-${test.board_id}
>
> and unless your query returns something like board-<some-id> it may
> not work for you.
>
> Anyway i shall put in a fix ion DIH to avoid this NPE
>
>
>
>
>
>
>
> On Thu, Jun 18, 2009 at 2:17 AM, Erik Hatcher<er...@ehatchersolutions.com> wrote:
>> First - DIH has worked pretty well in a new customer engagement of ours.
>>  We've easily imported tens of millions of records with no problem.  Kudos
>> to the developers/contributors to DIH - it got us up and running quickly.
>>  But now we're delving into more complexities and having some issues.
>>
>> Now on to my current issue, doing a delta-import such that records marked as
>> "deleted" in the database are removed from Solr using deletedPkQuery.
>>
>> Here's a config I'm using against a mocked test database:
>>
>> <dataConfig>
>>  <dataSource driver="com.mysql.jdbc.Driver"
>> url="jdbc:mysql://localhost/db"/>
>>  <document name="tests">
>>    <entity name="test"
>>            pk="board_id"
>>            transformer="TemplateTransformer"
>>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>>            query="select * from boards where deleted = 'N'"
>>            deltaImportQuery="select * from boards where deleted = 'N'"
>>            deltaQuery="select * from boards where deleted = 'N'"
>>            preImportDeleteQuery="datasource:board">
>>      <field column="id" template="board-${test.board_id}"/>
>>      <field column="datasource" template="board"/>
>>      <field column="title" />
>>    </entity>
>>  </document>
>> </dataConfig>
>>
>> Note that the uniqueKey in Solr is the "id" field.  And its value is a
>> template board-<PK>.
>>
>> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In
>> our definition, unique key of Solr document is the primary key of the top
>> level entity".  This of course isn't really an appropriate assumption.
>>
>> I also tried a deletedPkQuery of "select concat('board-',board_id) from
>> boards where deleted = 'Y'", but got an NPE (relevant stack trace below).
>>
>> It seems that deletedPkQuery only works if the pk and Solr's uniqueKey field
>> use the same value.  Is that the case?  If this is the case we'll need to
>> fix this somehow.  Any suggestions?
>>
>> Thanks,
>>        Erik
>>
>> stack trace from scenario mentioned above:
>> SEVERE: Delta Import Failed
>> java.lang.NullPointerException
>>        at
>> org.apache.solr.handler.dataimport.SolrWriter.deleteDoc(SolrWriter.java:83)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.deleteAll(DocBuilder.java:275)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:247)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
>>        at
>> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: pk vs. uniqueKey with DIH delta-import

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jun 18, 2009, at 4:51 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:
> apparently the row return a null 'board_id'

No.  I'm working with a test database situation with a single record,  
and I simply do a full-import, then change the deleted column to 'Y'  
and try a delta-import.  The deletedPkQuery returns a single result in  
that case, and the NPE came from when I made the query return board-1  
instead of just 1.

> your stacktrace sugggests this. even if it is fixed I guess it may not
> work because your are storing the id as
>
> board-${test.board_id}
>
> and unless your query returns something like board-<some-id> it may
> not work for you.
>
> Anyway i shall put in a fix ion DIH to avoid this NPE

That fix didn't solve the NPE.  I still get the following stacktrace  
when I have deletedPkQuery="select concat('board-',board_id) from  
boards where deleted = 'Y'".   I presume it's looking for the pk  
column (board_id in my case) in the results of the deletedPkQuery.

SEVERE: Delta Import Failed
java.lang.NullPointerException
	at  
org 
.apache.solr.handler.dataimport.SolrWriter.deleteDoc(SolrWriter.java:83)
	at  
org 
.apache.solr.handler.dataimport.DocBuilder.deleteAll(DocBuilder.java: 
289)
	at  
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java: 
247)
	at  
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java: 
159)
	at  
org 
.apache 
.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java: 
337)

I changed to deletedPkQuery="select concat('board-',board_id) as  
board_id from boards where deleted = 'Y'" and got no NPE, but I also  
still haven't been able to get DIH to properly remove Solr documents  
that have been flagged as deleted in the database.

	Erik



Re: pk vs. uniqueKey with DIH delta-import

Posted by Lance Norskog <go...@gmail.com>.
https://issues.apache.org/jira/browse/SOLR-1191

describes a different problem but I think his Ali's solution applies here.

I tried 'select concat("",id) from table' and this also had the same
exception. I can't test now, but I think this is the solution:

select concat("prefix",id) AS ID

The JDBC code may be hunting for "id" as a return from the query instead of
just accepting a return with an unnamed value?

On Thu, Jun 18, 2009 at 6:35 AM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

>
> On Jun 18, 2009, at 4:51 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>  apparently the row return a null 'board_id'
>>
>
> I replied "No" earlier, but of course you're right here.  The
> deletedPkQuery I originally used was not returning a board_id column.  And
> even if it did, that isn't the uniqueKey (id field) value.
>
> What about having the results of the deletedPkQuery run through the same
> transformation process that indexing would, only for the field that matches
> Solr's uniqueKey setting would be necessary??
>
>        Erik
>
>


-- 
Lance Norskog
goksron@gmail.com
650-922-8831 (US)

Re: pk vs. uniqueKey with DIH delta-import

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jun 18, 2009, at 4:51 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> apparently the row return a null 'board_id'

I replied "No" earlier, but of course you're right here.  The  
deletedPkQuery I originally used was not returning a board_id column.   
And even if it did, that isn't the uniqueKey (id field) value.

What about having the results of the deletedPkQuery run through the  
same transformation process that indexing would, only for the field  
that matches Solr's uniqueKey setting would be necessary??

	Erik


Re: pk vs. uniqueKey with DIH delta-import

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
apparently the row return a null 'board_id'

your stacktrace sugggests this. even if it is fixed I guess it may not
work because your are storing the id as


board-${test.board_id}

and unless your query returns something like board-<some-id> it may
not work for you.

Anyway i shall put in a fix ion DIH to avoid this NPE







On Thu, Jun 18, 2009 at 2:17 AM, Erik Hatcher<er...@ehatchersolutions.com> wrote:
> First - DIH has worked pretty well in a new customer engagement of ours.
>  We've easily imported tens of millions of records with no problem.  Kudos
> to the developers/contributors to DIH - it got us up and running quickly.
>  But now we're delving into more complexities and having some issues.
>
> Now on to my current issue, doing a delta-import such that records marked as
> "deleted" in the database are removed from Solr using deletedPkQuery.
>
> Here's a config I'm using against a mocked test database:
>
> <dataConfig>
>  <dataSource driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://localhost/db"/>
>  <document name="tests">
>    <entity name="test"
>            pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      <field column="id" template="board-${test.board_id}"/>
>      <field column="datasource" template="board"/>
>      <field column="title" />
>    </entity>
>  </document>
> </dataConfig>
>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a
> template board-<PK>.
>
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In
> our definition, unique key of Solr document is the primary key of the top
> level entity".  This of course isn't really an appropriate assumption.
>
> I also tried a deletedPkQuery of "select concat('board-',board_id) from
> boards where deleted = 'Y'", but got an NPE (relevant stack trace below).
>
> It seems that deletedPkQuery only works if the pk and Solr's uniqueKey field
> use the same value.  Is that the case?  If this is the case we'll need to
> fix this somehow.  Any suggestions?
>
> Thanks,
>        Erik
>
> stack trace from scenario mentioned above:
> SEVERE: Delta Import Failed
> java.lang.NullPointerException
>        at
> org.apache.solr.handler.dataimport.SolrWriter.deleteDoc(SolrWriter.java:83)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.deleteAll(DocBuilder.java:275)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:247)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com