You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Mannott, Birgit" <B....@klopotek.de> on 2017/08/15 14:09:00 UTC

Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Hi,

I'm using solr 6.6.0 and I have to do a complex data import from an oracle db concerning 3.500.000 data rows.
For each row I have 15 additional entities. That means that more than 52 Million selects are send to the database.
For every select that is done I optimized the oracle execution path by creating indizes.
The execution plans are ok.
But the import still lasts 12 hours.

I think, the main remaining problem is that oracle cursor sharing is not used and that for every select a hard parse is done.
Solr does not use binding variables. This would be the easiest way to use oracle cursor sharing. But I didn't find anything about influencing the way solr builds select statements.
I could force oracle cursor sharing without binding variables but I have to do this configuration for the session. I'm not allowed to change the configuration of the whole database system.

Is there a way to execute a command like "ALTER SESSION SET cursor_sharing = FORCE;" after getting the connection for processing an entity?

Thanks,
Birgit

RE: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Posted by Phil Scadden <P....@gns.cri.nz>.
Perhaps there is potential to optimize with some PLSQL functions on Oracle side to do as much work within database as possible and have the text indexers only access a view referencing that function. Also, the obvious optimization is a record-updated timestamp so that every time indexer runs, only changed data is managed.

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org]
Sent: Wednesday, 16 August 2017 5:42 a.m.
To: solr-user@lucene.apache.org
Subject: Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

On 8/15/2017 8:09 AM, Mannott, Birgit wrote:
> I'm using solr 6.6.0 and I have to do a complex data import from an oracle db concerning 3.500.000 data rows.
> For each row I have 15 additional entities. That means that more than 52 Million selects are send to the database.
> For every select that is done I optimized the oracle execution path by creating indizes.
> The execution plans are ok.
> But the import still lasts 12 hours.

I think the reason it takes 12 hours is because there are 52 million SELECT statements.  That many statements over 12 hours is an average of
1200 per second.  This sounds like pretty good database performance.

> Is there a way to execute a command like "ALTER SESSION SET cursor_sharing = FORCE;" after getting the connection for processing an entity?

I think that most JDBC drivers (by default) don't allow multiple SQL statements to be sent in a single request, so commands like "SELECT FOO; SELECT BAR" won't work.  The idea behind denying this kind of command is protection against SQL injection attacks.  There is likely a JDBC URL parameter for the Oracle driver that would allow that ... and if there is, then you could add that to the connection URL in the DIH config to allow putting the ALTER SESSION statement before SELECT in your DIH entity.

The Oracle driver might also have a JDBC URL parameter to turn on the cursor sharing you're interested in.  That would be the best way to handle it, if that is an option.  You're going to need to consult Oracle documentation or an Oracle support resource to find out what URL parameter options there are for their driver.

I have near zero experience with Oracle databases, but I suspect that even with cursor sharing, you're still going to have the sheer number of SELECT statements as a bottleneck.  If there is a performance improvement, it probably won't be dramatic.

Thanks,
Shawn

Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.

Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/15/2017 8:09 AM, Mannott, Birgit wrote:
> I'm using solr 6.6.0 and I have to do a complex data import from an oracle db concerning 3.500.000 data rows.
> For each row I have 15 additional entities. That means that more than 52 Million selects are send to the database.
> For every select that is done I optimized the oracle execution path by creating indizes.
> The execution plans are ok.
> But the import still lasts 12 hours.

I think the reason it takes 12 hours is because there are 52 million
SELECT statements.  That many statements over 12 hours is an average of
1200 per second.  This sounds like pretty good database performance.

> Is there a way to execute a command like "ALTER SESSION SET cursor_sharing = FORCE;" after getting the connection for processing an entity?

I think that most JDBC drivers (by default) don't allow multiple SQL
statements to be sent in a single request, so commands like "SELECT FOO;
SELECT BAR" won't work.  The idea behind denying this kind of command is
protection against SQL injection attacks.  There is likely a JDBC URL
parameter for the Oracle driver that would allow that ... and if there
is, then you could add that to the connection URL in the DIH config to
allow putting the ALTER SESSION statement before SELECT in your DIH entity.

The Oracle driver might also have a JDBC URL parameter to turn on the
cursor sharing you're interested in.  That would be the best way to
handle it, if that is an option.  You're going to need to consult Oracle
documentation or an Oracle support resource to find out what URL
parameter options there are for their driver.

I have near zero experience with Oracle databases, but I suspect that
even with cursor sharing, you're still going to have the sheer number of
SELECT statements as a bottleneck.  If there is a performance
improvement, it probably won't be dramatic.

Thanks,
Shawn


[bulk]: Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Posted by "Mannott, Birgit" <B....@klopotek.de>.
Thanks to all for your support. Many good ideas!
But meanwhile I found a way using the Oracle "AFTER LOGON" Trigger where I am able to identify if it is a connection from solr and then switch to cursor sharing. 

By the way cursor sharing reduced the import time from 12 to 6 hours !!
So it would be really great if the Data Import Handler would be able to use binding variables in selects so that cursor sharing could be done easier.

Regards,
Birgit


-----Original Message-----
From: Mannott, Birgit [mailto:B.Mannott@klopotek.de] 
Sent: Tuesday, August 15, 2017 4:58 PM
To: solr-user@lucene.apache.org
Subject: [bulk]: RE: [bulk]: Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Yes, I'm using Data Import Handler and I would prefer a solution for this way of import because it's already tested and the imported data is ok and everything is fine. 
I just have to speed it up a little...

But thanks for your info. Next time I'll try indexing with solrj.

Regards,
Birgit


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, August 15, 2017 4:33 PM
To: solr-user <so...@lucene.apache.org>
Subject: [bulk]: Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

I presume you're using Data Import Handler? An alternative when you get into complex imports is to use a SolrJ client, here's a sample.
That way you can use whatever tools the particular JDBC connector will allow and can be much faster.

https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Tue, Aug 15, 2017 at 7:09 AM, Mannott, Birgit <B....@klopotek.de> wrote:
> Hi,
>
> I'm using solr 6.6.0 and I have to do a complex data import from an oracle db concerning 3.500.000 data rows.
> For each row I have 15 additional entities. That means that more than 52 Million selects are send to the database.
> For every select that is done I optimized the oracle execution path by creating indizes.
> The execution plans are ok.
> But the import still lasts 12 hours.
>
> I think, the main remaining problem is that oracle cursor sharing is not used and that for every select a hard parse is done.
> Solr does not use binding variables. This would be the easiest way to use oracle cursor sharing. But I didn't find anything about influencing the way solr builds select statements.
> I could force oracle cursor sharing without binding variables but I have to do this configuration for the session. I'm not allowed to change the configuration of the whole database system.
>
> Is there a way to execute a command like "ALTER SESSION SET cursor_sharing = FORCE;" after getting the connection for processing an entity?
>
> Thanks,
> Birgit

Re: [bulk]: Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Posted by David Hastings <ha...@gmail.com>.
If you dont want to use your own Solj code, why not try many concurrent
indexers that index different data sets.  So run seven indexers each
getting 500,000 rows at the exact same time perhaps.  Its a hack, if it
works, but if you have the machinery to do it, why not. or use the
deltaquery, but i havent used the DIH that much to know if that will help
you.

On Tue, Aug 15, 2017 at 10:57 AM, Mannott, Birgit <B....@klopotek.de>
wrote:

> Yes, I'm using Data Import Handler and I would prefer a solution for this
> way of import because it's already tested and the imported data is ok and
> everything is fine.
> I just have to speed it up a little...
>
> But thanks for your info. Next time I'll try indexing with solrj.
>
> Regards,
> Birgit
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Tuesday, August 15, 2017 4:33 PM
> To: solr-user <so...@lucene.apache.org>
> Subject: [bulk]: Re: Optimizing Dataimport from Oracle; cursor sharing;
> changing oracle session parameters
>
> I presume you're using Data Import Handler? An alternative when you get
> into complex imports is to use a SolrJ client, here's a sample.
> That way you can use whatever tools the particular JDBC connector will
> allow and can be much faster.
>
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
> On Tue, Aug 15, 2017 at 7:09 AM, Mannott, Birgit <B....@klopotek.de>
> wrote:
> > Hi,
> >
> > I'm using solr 6.6.0 and I have to do a complex data import from an
> oracle db concerning 3.500.000 data rows.
> > For each row I have 15 additional entities. That means that more than 52
> Million selects are send to the database.
> > For every select that is done I optimized the oracle execution path by
> creating indizes.
> > The execution plans are ok.
> > But the import still lasts 12 hours.
> >
> > I think, the main remaining problem is that oracle cursor sharing is not
> used and that for every select a hard parse is done.
> > Solr does not use binding variables. This would be the easiest way to
> use oracle cursor sharing. But I didn't find anything about influencing the
> way solr builds select statements.
> > I could force oracle cursor sharing without binding variables but I have
> to do this configuration for the session. I'm not allowed to change the
> configuration of the whole database system.
> >
> > Is there a way to execute a command like "ALTER SESSION SET
> cursor_sharing = FORCE;" after getting the connection for processing an
> entity?
> >
> > Thanks,
> > Birgit
>

RE: [bulk]: Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Posted by "Mannott, Birgit" <B....@klopotek.de>.
Yes, I'm using Data Import Handler and I would prefer a solution for this way of import because it's already tested and the imported data is ok and everything is fine. 
I just have to speed it up a little...

But thanks for your info. Next time I'll try indexing with solrj.

Regards,
Birgit


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, August 15, 2017 4:33 PM
To: solr-user <so...@lucene.apache.org>
Subject: [bulk]: Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

I presume you're using Data Import Handler? An alternative when you get into complex imports is to use a SolrJ client, here's a sample.
That way you can use whatever tools the particular JDBC connector will allow and can be much faster.

https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Tue, Aug 15, 2017 at 7:09 AM, Mannott, Birgit <B....@klopotek.de> wrote:
> Hi,
>
> I'm using solr 6.6.0 and I have to do a complex data import from an oracle db concerning 3.500.000 data rows.
> For each row I have 15 additional entities. That means that more than 52 Million selects are send to the database.
> For every select that is done I optimized the oracle execution path by creating indizes.
> The execution plans are ok.
> But the import still lasts 12 hours.
>
> I think, the main remaining problem is that oracle cursor sharing is not used and that for every select a hard parse is done.
> Solr does not use binding variables. This would be the easiest way to use oracle cursor sharing. But I didn't find anything about influencing the way solr builds select statements.
> I could force oracle cursor sharing without binding variables but I have to do this configuration for the session. I'm not allowed to change the configuration of the whole database system.
>
> Is there a way to execute a command like "ALTER SESSION SET cursor_sharing = FORCE;" after getting the connection for processing an entity?
>
> Thanks,
> Birgit

Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Posted by Walter Underwood <wu...@wunderwood.org>.
This might be a hack, but the CSV importer is really fast. Run the query in your favorite command line and export to CSV, then load it.

You can even make batches. Maybe use ranges of the ID, then delete by query for that range.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 15, 2017, at 8:46 AM, Stefan Matheis <st...@mathe.is> wrote:
> 
> Birgit,
> 
> any chance to utilise one of the caching strategies that DIH offers?
> 
> Like building a complete map for one of the subentities? That would mean
> reading the whole table at the beginning and then only doing lookups by key.
> 
> Or getting data from subentities with joins in your main entity?
> 
> Heavily depends on the amount of data we're talking about - but might be
> worth a thought.
> 
> Best
> -Stefan
> 
> On Aug 15, 2017 4:33 PM, "Erick Erickson" <er...@gmail.com> wrote:
> 
> I presume you're using Data Import Handler? An alternative when you
> get into complex imports is to use a SolrJ client, here's a sample.
> That way you can use whatever tools the particular JDBC connector will
> allow and can be much faster.
> 
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
> 
> Best,
> Erick
> 
> On Tue, Aug 15, 2017 at 7:09 AM, Mannott, Birgit <B....@klopotek.de>
> wrote:
>> Hi,
>> 
>> I'm using solr 6.6.0 and I have to do a complex data import from an
> oracle db concerning 3.500.000 data rows.
>> For each row I have 15 additional entities. That means that more than 52
> Million selects are send to the database.
>> For every select that is done I optimized the oracle execution path by
> creating indizes.
>> The execution plans are ok.
>> But the import still lasts 12 hours.
>> 
>> I think, the main remaining problem is that oracle cursor sharing is not
> used and that for every select a hard parse is done.
>> Solr does not use binding variables. This would be the easiest way to use
> oracle cursor sharing. But I didn't find anything about influencing the way
> solr builds select statements.
>> I could force oracle cursor sharing without binding variables but I have
> to do this configuration for the session. I'm not allowed to change the
> configuration of the whole database system.
>> 
>> Is there a way to execute a command like "ALTER SESSION SET
> cursor_sharing = FORCE;" after getting the connection for processing an
> entity?
>> 
>> Thanks,
>> Birgit


Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Posted by Stefan Matheis <st...@mathe.is>.
Birgit,

any chance to utilise one of the caching strategies that DIH offers?

Like building a complete map for one of the subentities? That would mean
reading the whole table at the beginning and then only doing lookups by key.

Or getting data from subentities with joins in your main entity?

Heavily depends on the amount of data we're talking about - but might be
worth a thought.

Best
-Stefan

On Aug 15, 2017 4:33 PM, "Erick Erickson" <er...@gmail.com> wrote:

I presume you're using Data Import Handler? An alternative when you
get into complex imports is to use a SolrJ client, here's a sample.
That way you can use whatever tools the particular JDBC connector will
allow and can be much faster.

https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Tue, Aug 15, 2017 at 7:09 AM, Mannott, Birgit <B....@klopotek.de>
wrote:
> Hi,
>
> I'm using solr 6.6.0 and I have to do a complex data import from an
oracle db concerning 3.500.000 data rows.
> For each row I have 15 additional entities. That means that more than 52
Million selects are send to the database.
> For every select that is done I optimized the oracle execution path by
creating indizes.
> The execution plans are ok.
> But the import still lasts 12 hours.
>
> I think, the main remaining problem is that oracle cursor sharing is not
used and that for every select a hard parse is done.
> Solr does not use binding variables. This would be the easiest way to use
oracle cursor sharing. But I didn't find anything about influencing the way
solr builds select statements.
> I could force oracle cursor sharing without binding variables but I have
to do this configuration for the session. I'm not allowed to change the
configuration of the whole database system.
>
> Is there a way to execute a command like "ALTER SESSION SET
cursor_sharing = FORCE;" after getting the connection for processing an
entity?
>
> Thanks,
> Birgit

Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

Posted by Erick Erickson <er...@gmail.com>.
I presume you're using Data Import Handler? An alternative when you
get into complex imports is to use a SolrJ client, here's a sample.
That way you can use whatever tools the particular JDBC connector will
allow and can be much faster.

https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Tue, Aug 15, 2017 at 7:09 AM, Mannott, Birgit <B....@klopotek.de> wrote:
> Hi,
>
> I'm using solr 6.6.0 and I have to do a complex data import from an oracle db concerning 3.500.000 data rows.
> For each row I have 15 additional entities. That means that more than 52 Million selects are send to the database.
> For every select that is done I optimized the oracle execution path by creating indizes.
> The execution plans are ok.
> But the import still lasts 12 hours.
>
> I think, the main remaining problem is that oracle cursor sharing is not used and that for every select a hard parse is done.
> Solr does not use binding variables. This would be the easiest way to use oracle cursor sharing. But I didn't find anything about influencing the way solr builds select statements.
> I could force oracle cursor sharing without binding variables but I have to do this configuration for the session. I'm not allowed to change the configuration of the whole database system.
>
> Is there a way to execute a command like "ALTER SESSION SET cursor_sharing = FORCE;" after getting the connection for processing an entity?
>
> Thanks,
> Birgit