You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by gadelkareem <ga...@gmail.com> on 2018/04/06 01:31:54 UTC

Data import batch mode for delta

Why the deltaImportQuery uses "where id='${dataimporter.id}'" instead of
something like where id IN ('${dataimporter.id})'



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Data import batch mode for delta

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/16/2018 7:32 PM, gadelkareem wrote:
> I cannot complain cuz it actually worked well for me so far but..
>
> I still do not understand if Solr already paginates the results from the
> full import, why not do the same for the delta. It is almost the same query:
> `select id from t where t.lastmod > ${solrTime}`
> `select * from t where id IN ${dataimporter.ids} limit 1000 offset 0`
> and so on..

Solr does not paginate SQL queries made by the dataimport handler 
(DIH).  It sends the query exactly as it is configured in the DIH config.

Thanks,
Shawn


Re: Data import batch mode for delta

Posted by gadelkareem <ga...@gmail.com>.
Thanks Shawn.

I cannot complain cuz it actually worked well for me so far but..

I still do not understand if Solr already paginates the results from the
full import, why not do the same for the delta. It is almost the same query:
`select id from t where t.lastmod > ${solrTime}`
`select * from t where id IN ${dataimporter.ids} limit 1000 offset 0` 
and so on..



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Data import batch mode for delta

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/5/2018 7:31 PM, gadelkareem wrote:
> Why the deltaImportQuery uses "where id='${dataimporter.id}'" instead of
> something like where id IN ('${dataimporter.id})'

Because there's only one value for that property.

If the deltaQuery returns a million rows, then deltaImportQuery is going 
to be executed a million times.  Once for each row returned by the 
deltaQuery.

That IS as inefficient as it sounds.  Think of the dataimport handler as 
a stop-gap solution -- to help you get started with loading data from a 
database, until you can write a proper application to do your indexing.

Thanks,
Shawn