You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joel Nylund <jn...@yahoo.com> on 2009/11/23 20:49:21 UTC

help with dataimport delta query

Hi, I have solr all working nicely, except im trying to get deltas to  
work on my data import handler

Here is a simplification of my data import config, I have a table  
called "Book" which has categories, im doing subquries for the  
category info and calling a javascript helper. This all works  
perfectly for the regular query.

I added these lines for the delta stuff:

	deltaImportQuery="SELECT f.id,f.title
			FROM Book f
			f.id='${dataimporter.delta.job_jobs_id}'"
		deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate > '${dataimporter.last_index_time}'"  >
		
basically im trying to rows that lastModifiedDate is newer than the  
last index (or deltaindex).

I run:
http://localhost:8983/solr/dataimport?command=delta-import

And it says in logs:

Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DataImporter doDeltaImport
INFO: Starting Delta Import
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.SolrWriter  
readIndexerProperties
INFO: Read dataimport.properties
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
doDelta
INFO: Starting delta collection.
Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=delta-import}  
status=0 QTime=0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Running ModifiedRowKey() for Entity: category
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed parentDeltaQuery for Entity: category
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Running ModifiedRowKey() for Entity: item
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
collectDelta
INFO: Completed parentDeltaQuery for Entity: item
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
doDelta
INFO: Delta Import completed successfully
Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder  
execute
INFO: Time taken = 0:0:0.21

But the browser says no documents added/modified (even though one  
record in db is a match)

Is there a way to turn debugging so I can see the queries the DIH is  
sending to the db?

Any other ideas of what I could be doing wrong?

thanks
Joel


<document name="doc">
     <entity name="item"
       query="SELECT f.id, f.title
		FROM Book f
		WHERE f.inMyList=1"
		deltaImportQuery="SELECT f.id,f.title
			FROM Book f
			f.id='${dataimporter.delta.job_jobs_id}'"
		deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate > '${dataimporter.last_index_time}'"  >
		
            <field column="id" name="id" />
            <field column="title" name="title" />
  		<entity name="category"  
transformer="script:SplitAndPrettyCategory" query="select fc.bookId,  
group_concat(cr.name) as categoryName,
		 from BookCat fc
		 where fc.bookId = '${item.id}' AND
		 group by fc.bookId">
		 <field column="categoryType" name="categoryType" />
		 </entity>
     </entity>
    </document>



Re: help with dataimport delta query

Posted by Joel Nylund <jn...@yahoo.com>.
got to love it when yahoo thinks your own mail is spam, anyone have  
any ideas how to get logging to work with 1.4.

I went to the admin panel and set all logging to finest.

In my jetty std out I see no SQL for any of the dataimport handler  
run. I see

Nov 23, 2009 9:26:27 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 6
Nov 23, 2009 9:26:32 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity category with URL: jdbc:mysql:// 
localhost/feeddb
Nov 23, 2009 9:26:32 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 5


But no sql, from looking at the source, it looks like it should be  
logging the sql if Im in debug mode.

any ideas, I think I am losing my mind.

my full import works, but the delta does nothing

thanks
Joel



On Nov 23, 2009, at 2:49 PM, Joel Nylund wrote:

> Hi, I have solr all working nicely, except im trying to get deltas  
> to work on my data import handler
>
> Here is a simplification of my data import config, I have a table  
> called "Book" which has categories, im doing subquries for the  
> category info and calling a javascript helper. This all works  
> perfectly for the regular query.
>
> I added these lines for the delta stuff:
>
> 	deltaImportQuery="SELECT f.id,f.title
> 			FROM Book f
> 			f.id='${dataimporter.delta.job_jobs_id}'"
> 		deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
> 		
> basically im trying to rows that lastModifiedDate is newer than the  
> last index (or deltaindex).
>
> I run:
> http://localhost:8983/solr/dataimport?command=delta-import
>
> And it says in logs:
>
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DataImporter doDeltaImport
> INFO: Starting Delta Import
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
> INFO: Read dataimport.properties
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder doDelta
> INFO: Starting delta collection.
> Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=delta-import}  
> status=0 QTime=0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Running ModifiedRowKey() for Entity: category
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed parentDeltaQuery for Entity: category
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Running ModifiedRowKey() for Entity: item
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder collectDelta
> INFO: Completed parentDeltaQuery for Entity: item
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder doDelta
> INFO: Delta Import completed successfully
> Nov 23, 2009 2:33:02 PM  
> org.apache.solr.handler.dataimport.DocBuilder execute
> INFO: Time taken = 0:0:0.21
>
> But the browser says no documents added/modified (even though one  
> record in db is a match)
>
> Is there a way to turn debugging so I can see the queries the DIH is  
> sending to the db?
>
> Any other ideas of what I could be doing wrong?
>
> thanks
> Joel
>
>
> <document name="doc">
>    <entity name="item"
>      query="SELECT f.id, f.title
> 		FROM Book f
> 		WHERE f.inMyList=1"
> 		deltaImportQuery="SELECT f.id,f.title
> 			FROM Book f
> 			f.id='${dataimporter.delta.job_jobs_id}'"
> 		deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
> 		
>           <field column="id" name="id" />
>           <field column="title" name="title" />
> 		<entity name="category"  
> transformer="script:SplitAndPrettyCategory" query="select fc.bookId,  
> group_concat(cr.name) as categoryName,
> 		 from BookCat fc
> 		 where fc.bookId = '${item.id}' AND
> 		 group by fc.bookId">
> 		 <field column="categoryType" name="categoryType" />
> 		 </entity>
>    </entity>
>   </document>
>
>


Re: help with dataimport delta query

Posted by Joel Nylund <jn...@yahoo.com>.
Thanks that was it, well really this part:

${dataimporter.delta.job_jobs_id}

I thought the jobs_id was part of the DIH, but I guess it was just the example, duh!

thanks
Joel


--- On Tue, 11/24/09, Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com> wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>
> Subject: Re: help with dataimport delta query
> To: solr-user@lucene.apache.org
> Date: Tuesday, November 24, 2009, 12:15 AM
> I guess the field names do not match
> in the deltaQuery you are selecting the field id
> 
> and in the deltaImportQuery you us the field as
> ${dataimporter.delta.job_jobs_id}
> I guess it should be ${dataimporter.delta.id}
> 
> On Tue, Nov 24, 2009 at 1:19 AM, Joel Nylund <jn...@yahoo.com>
> wrote:
> > Hi, I have solr all working nicely, except im trying
> to get deltas to work
> > on my data import handler
> >
> > Here is a simplification of my data import config, I
> have a table called
> > "Book" which has categories, im doing subquries for
> the category info and
> > calling a javascript helper. This all works perfectly
> for the regular query.
> >
> > I added these lines for the delta stuff:
> >
> >        deltaImportQuery="SELECT f.id,f.title
> >                        FROM Book f
> >                      
>  f.id='${dataimporter.delta.job_jobs_id}'"
> >                deltaQuery="SELECT id FROM
> `Book` WHERE fm.inMyList=1 AND
> > lastModifiedDate >
> '${dataimporter.last_index_time}'"  >
> >
> > basically im trying to rows that lastModifiedDate is
> newer than the last
> > index (or deltaindex).
> >
> > I run:
> > http://localhost:8983/solr/dataimport?command=delta-import
> >
> > And it says in logs:
> >
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DataImporter
> > doDeltaImport
> > INFO: Starting Delta Import
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.SolrWriter
> > readIndexerProperties
> > INFO: Read dataimport.properties
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > doDelta
> > INFO: Starting delta collection.
> > Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore
> execute
> > INFO: [] webapp=/solr path=/dataimport
> params={command=delta-import}
> > status=0 QTime=0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Running ModifiedRowKey() for Entity: category
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed ModifiedRowKey for Entity: category
> rows obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed DeletedRowKey for Entity: category
> rows obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed parentDeltaQuery for Entity: category
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Running ModifiedRowKey() for Entity: item
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed ModifiedRowKey for Entity: item rows
> obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed DeletedRowKey for Entity: item rows
> obtained : 0
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > collectDelta
> > INFO: Completed parentDeltaQuery for Entity: item
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > doDelta
> > INFO: Delta Import completed successfully
> > Nov 23, 2009 2:33:02 PM
> org.apache.solr.handler.dataimport.DocBuilder
> > execute
> > INFO: Time taken = 0:0:0.21
> >
> > But the browser says no documents added/modified (even
> though one record in
> > db is a match)
> >
> > Is there a way to turn debugging so I can see the
> queries the DIH is sending
> > to the db?
> >
> > Any other ideas of what I could be doing wrong?
> >
> > thanks
> > Joel
> >
> >
> > <document name="doc">
> >    <entity name="item"
> >      query="SELECT f.id, f.title
> >                FROM Book f
> >                WHERE f.inMyList=1"
> >                deltaImportQuery="SELECT
> f.id,f.title
> >                        FROM Book f
> >                      
>  f.id='${dataimporter.delta.job_jobs_id}'"
> >                deltaQuery="SELECT id FROM
> `Book` WHERE fm.inMyList=1 AND
> > lastModifiedDate >
> '${dataimporter.last_index_time}'"  >
> >
> >           <field column="id" name="id" />
> >           <field column="title" name="title"
> />
> >                <entity name="category"
> > transformer="script:SplitAndPrettyCategory"
> query="select fc.bookId,
> > group_concat(cr.name) as categoryName,
> >                 from BookCat fc
> >                 where fc.bookId = '${item.id}'
> AND
> >                 group by fc.bookId">
> >                 <field
> column="categoryType" name="categoryType" />
> >                 </entity>
> >    </entity>
> >   </document>
> >
> >
> >
> 
> 
> 
> -- 
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 

Re: help with dataimport delta query

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
I guess the field names do not match
in the deltaQuery you are selecting the field id

and in the deltaImportQuery you us the field as
${dataimporter.delta.job_jobs_id}
I guess it should be ${dataimporter.delta.id}

On Tue, Nov 24, 2009 at 1:19 AM, Joel Nylund <jn...@yahoo.com> wrote:
> Hi, I have solr all working nicely, except im trying to get deltas to work
> on my data import handler
>
> Here is a simplification of my data import config, I have a table called
> "Book" which has categories, im doing subquries for the category info and
> calling a javascript helper. This all works perfectly for the regular query.
>
> I added these lines for the delta stuff:
>
>        deltaImportQuery="SELECT f.id,f.title
>                        FROM Book f
>                        f.id='${dataimporter.delta.job_jobs_id}'"
>                deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
>
> basically im trying to rows that lastModifiedDate is newer than the last
> index (or deltaindex).
>
> I run:
> http://localhost:8983/solr/dataimport?command=delta-import
>
> And it says in logs:
>
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> INFO: Starting Delta Import
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> doDelta
> INFO: Starting delta collection.
> Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=delta-import}
> status=0 QTime=0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: category
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: category
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Running ModifiedRowKey() for Entity: item
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: item
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> doDelta
> INFO: Delta Import completed successfully
> Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder
> execute
> INFO: Time taken = 0:0:0.21
>
> But the browser says no documents added/modified (even though one record in
> db is a match)
>
> Is there a way to turn debugging so I can see the queries the DIH is sending
> to the db?
>
> Any other ideas of what I could be doing wrong?
>
> thanks
> Joel
>
>
> <document name="doc">
>    <entity name="item"
>      query="SELECT f.id, f.title
>                FROM Book f
>                WHERE f.inMyList=1"
>                deltaImportQuery="SELECT f.id,f.title
>                        FROM Book f
>                        f.id='${dataimporter.delta.job_jobs_id}'"
>                deltaQuery="SELECT id FROM `Book` WHERE fm.inMyList=1 AND
> lastModifiedDate > '${dataimporter.last_index_time}'"  >
>
>           <field column="id" name="id" />
>           <field column="title" name="title" />
>                <entity name="category"
> transformer="script:SplitAndPrettyCategory" query="select fc.bookId,
> group_concat(cr.name) as categoryName,
>                 from BookCat fc
>                 where fc.bookId = '${item.id}' AND
>                 group by fc.bookId">
>                 <field column="categoryType" name="categoryType" />
>                 </entity>
>    </entity>
>   </document>
>
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com