You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vrindavda <vr...@gmail.com> on 2017/05/31 09:21:35 UTC

Number of requests spike up, when i do the delta Import.

Hello,
Number of requests spike up, whenever I do the delta import in Solr.
Please help me understand this.


<http://lucene.472066.n3.nabble.com/file/n4338162/solr.jpg> 



--
View this message in context: http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-tp4338162.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Number of requests spike up, when i do the delta Import.

Posted by vrindavda <vr...@gmail.com>.
I found this article helpful.

https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport



--
View this message in context: http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-tp4338162p4339168.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Number of requests spike up, when i do the delta Import.

Posted by Erick Erickson <er...@gmail.com>.
A similar pattern should work with .NET, all that's
necessary is a JDBC driver for connecting to the database
and an connection to a Solr node.

SolrNet will not be as performant as SolrJ I'd guess
since there's no equivalent to CloudSolrClient. You
can still SolrNet, any connection to any Solr node
will "do the right thing".

Best,
Erick

On Fri, Jun 2, 2017 at 4:01 AM, Rick Leir <rl...@leirtech.com> wrote:
> Vrin
> We had a good speedup from enabling a SQL cache. You also need to avoid updating the DB tables so the cache does not get flushed.
> Cheers -- Rick
>
> On June 2, 2017 4:49:20 AM EDT, vrindavda <vr...@gmail.com> wrote:
>>Thanks Erick ,
>>
>>Could you please suggest some alternative to go with SolrNET.
>>
>>@jlman, I tried your way, that do reduces the number of request, but
>>delta-import still take longer than full-import. There is no
>>improvement in
>>performance.
>>
>>
>>
>>--
>>View this message in context:
>>http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-tp4338162p4338591.html
>>Sent from the Solr - User mailing list archive at Nabble.com.
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Number of requests spike up, when i do the delta Import.

Posted by Rick Leir <rl...@leirtech.com>.
Vrin
We had a good speedup from enabling a SQL cache. You also need to avoid updating the DB tables so the cache does not get flushed. 
Cheers -- Rick

On June 2, 2017 4:49:20 AM EDT, vrindavda <vr...@gmail.com> wrote:
>Thanks Erick ,
>
>Could you please suggest some alternative to go with SolrNET.
>
>@jlman, I tried your way, that do reduces the number of request, but
>delta-import still take longer than full-import. There is no
>improvement in
>performance. 
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-tp4338162p4338591.html
>Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Number of requests spike up, when i do the delta Import.

Posted by vrindavda <vr...@gmail.com>.
Thanks Erick ,

Could you please suggest some alternative to go with SolrNET.

@jlman, I tried your way, that do reduces the number of request, but
delta-import still take longer than full-import. There is no improvement in
performance. 



--
View this message in context: http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-tp4338162p4338591.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Number of requests spike up, when i do the delta Import.

Posted by Erick Erickson <er...@gmail.com>.
Well, personally I like to use SolrJ rather than DIH for both
debugging ease and the reasons outlined here:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

FWIW
Erick

On Thu, Jun 1, 2017 at 7:59 AM, Josh Lincoln <jo...@gmail.com> wrote:
> I had the same issue as Vrinda and found a hacky way to limit the number of
> times deltaImportQuery was executed.
>
> As designed, solr executes *deltaQuery* to get a list of ids that need to
> be indexed. For each of those it executes *deltaImportQuery*, which is
> typically very similar to the full *query*.
>
> I constructed a deltaQuery to purposely only return 1 row. E.g.
>
>      deltaQuery = "SELECT id FROM table WHERE rownum=1"    // written for
> oracle, likely requires a different syntax for other dbs. Also, it occurred
> to you could probably include the date>= '${dataimporter.last_index_time}'
> filter here so this returns 0 rows if no data has changed
>
> Since *deltaImportQuery now *only gets called once I needed to add the
> filter logic to *deltaImportQuery *to only select the changed rows (that
> logic is normally in *deltaQuery*). E.g.
>
>     deltaImportQuery = [normal import query] WHERE date >=
> '${dataimporter.last_index_time}'
>
>
> This significantly reduced the number of database queries for delta
> imports, and sped up the processing.
>
> On Thu, Jun 1, 2017 at 6:07 AM Amrit Sarkar <sa...@gmail.com> wrote:
>
>> Erick,
>>
>> Thanks for the pointer. Getting astray from what Vrinda is looking for
>> (sorry about that), what if there are no sub-entities? and no
>> deltaImportQuery passed too. I looked into the code and determine it
>> calculates the deltaImportQuery itself,
>> SQLEntityProcessor:getDeltaImportQuery(..)::126.
>>
>> Ideally then, a full-import or the delta-import should take similar time to
>> build the docs (fetch next row). I may very well be going entirely wrong
>> here.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269 <(415)%20589-9269>
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Thu, Jun 1, 2017 at 1:50 PM, vrindavda <vr...@gmail.com> wrote:
>>
>> > Thanks Erick,
>> >
>> >  But how do I solve this? I tried creating Stored proc instead of plain
>> > query, but no change in performance.
>> >
>> > For delta import it in processing more documents than the total
>> documents.
>> > In this case delta import is not helping at all, I cannot switch to full
>> > import each time. This was working fine with less data.
>> >
>> > Thank you,
>> > Vrinda Davda
>> >
>> >
>> >
>> > --
>> > View this message in context: http://lucene.472066.n3.
>> > nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-
>> > Import-tp4338162p4338444.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>

Re: Number of requests spike up, when i do the delta Import.

Posted by Josh Lincoln <jo...@gmail.com>.
I had the same issue as Vrinda and found a hacky way to limit the number of
times deltaImportQuery was executed.

As designed, solr executes *deltaQuery* to get a list of ids that need to
be indexed. For each of those it executes *deltaImportQuery*, which is
typically very similar to the full *query*.

I constructed a deltaQuery to purposely only return 1 row. E.g.

     deltaQuery = "SELECT id FROM table WHERE rownum=1"    // written for
oracle, likely requires a different syntax for other dbs. Also, it occurred
to you could probably include the date>= '${dataimporter.last_index_time}'
filter here so this returns 0 rows if no data has changed

Since *deltaImportQuery now *only gets called once I needed to add the
filter logic to *deltaImportQuery *to only select the changed rows (that
logic is normally in *deltaQuery*). E.g.

    deltaImportQuery = [normal import query] WHERE date >=
'${dataimporter.last_index_time}'


This significantly reduced the number of database queries for delta
imports, and sped up the processing.

On Thu, Jun 1, 2017 at 6:07 AM Amrit Sarkar <sa...@gmail.com> wrote:

> Erick,
>
> Thanks for the pointer. Getting astray from what Vrinda is looking for
> (sorry about that), what if there are no sub-entities? and no
> deltaImportQuery passed too. I looked into the code and determine it
> calculates the deltaImportQuery itself,
> SQLEntityProcessor:getDeltaImportQuery(..)::126.
>
> Ideally then, a full-import or the delta-import should take similar time to
> build the docs (fetch next row). I may very well be going entirely wrong
> here.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269 <(415)%20589-9269>
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Thu, Jun 1, 2017 at 1:50 PM, vrindavda <vr...@gmail.com> wrote:
>
> > Thanks Erick,
> >
> >  But how do I solve this? I tried creating Stored proc instead of plain
> > query, but no change in performance.
> >
> > For delta import it in processing more documents than the total
> documents.
> > In this case delta import is not helping at all, I cannot switch to full
> > import each time. This was working fine with less data.
> >
> > Thank you,
> > Vrinda Davda
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-
> > Import-tp4338162p4338444.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Number of requests spike up, when i do the delta Import.

Posted by Amrit Sarkar <sa...@gmail.com>.
Erick,

Thanks for the pointer. Getting astray from what Vrinda is looking for
(sorry about that), what if there are no sub-entities? and no
deltaImportQuery passed too. I looked into the code and determine it
calculates the deltaImportQuery itself,
SQLEntityProcessor:getDeltaImportQuery(..)::126.

Ideally then, a full-import or the delta-import should take similar time to
build the docs (fetch next row). I may very well be going entirely wrong
here.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Jun 1, 2017 at 1:50 PM, vrindavda <vr...@gmail.com> wrote:

> Thanks Erick,
>
>  But how do I solve this? I tried creating Stored proc instead of plain
> query, but no change in performance.
>
> For delta import it in processing more documents than the total documents.
> In this case delta import is not helping at all, I cannot switch to full
> import each time. This was working fine with less data.
>
> Thank you,
> Vrinda Davda
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-
> Import-tp4338162p4338444.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Number of requests spike up, when i do the delta Import.

Posted by vrindavda <vr...@gmail.com>.
Thanks Erick,

 But how do I solve this? I tried creating Stored proc instead of plain
query, but no change in performance.

For delta import it in processing more documents than the total documents.
In this case delta import is not helping at all, I cannot switch to full
import each time. This was working fine with less data.

Thank you,
Vrinda Davda



--
View this message in context: http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-tp4338162p4338444.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Number of requests spike up, when i do the delta Import.

Posted by Erick Erickson <er...@gmail.com>.
This is often the delta query configuration, where sub-entities may
execute a DB request for each row. Is that possible?

Best,
Erick

On Wed, May 31, 2017 at 2:58 AM, vrindavda <vr...@gmail.com> wrote:
> Exactly, Delta import in taking More than Delta
>
> Here are the details required.
>
> When I do the delta import for 600(of total 291,633) documents is get this :
>
> Indexing completed. Added/Updated: 360,000 documents. Deleted 0 documents.
> (Duration: 6m 58s)
>
> For Full import :
>
> Indexing completed. Added/Updated: 291,633 documents. Deleted 0 documents.
> (Duration: 3m 07s)
>
> Thank you,
> Vrinda Davda
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-tp4338162p4338167.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Number of requests spike up, when i do the delta Import.

Posted by vrindavda <vr...@gmail.com>.
Exactly, Delta import in taking More than Delta

Here are the details required. 

When I do the delta import for 600(of total 291,633) documents is get this :

Indexing completed. Added/Updated: 360,000 documents. Deleted 0 documents.
(Duration: 6m 58s)

For Full import :

Indexing completed. Added/Updated: 291,633 documents. Deleted 0 documents.
(Duration: 3m 07s)

Thank you,
Vrinda Davda



--
View this message in context: http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-tp4338162p4338167.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Number of requests spike up, when i do the delta Import.

Posted by Amrit Sarkar <sa...@gmail.com>.
I am facing kinda similar issue lately where full-import is taking seconds
while delta-import is taking hours.

Can you share some more metrics/numbers related to full-import and
delta-import requested, rows fetched and time?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Wed, May 31, 2017 at 2:51 PM, vrindavda <vr...@gmail.com> wrote:

> Hello,
> Number of requests spike up, whenever I do the delta import in Solr.
> Please help me understand this.
>
>
> <http://lucene.472066.n3.nabble.com/file/n4338162/solr.jpg>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-
> Import-tp4338162.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>