You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by stockii <st...@shopgate.com> on 2010/06/01 13:12:33 UTC

DIH, Full-Import, DB and Performance.

Hello..

We have about 4 Million Products in our our Database and the Import takes
about 1,5 hours. In this Time is the Performance of the Database very bad
and our Server crashed sometimes. It's seems that DIH send only ONE select
to the db ?!?! is that right ? 

all other processes cannot connect to the db =(...

thats very bad !!!! what is the best solution to make a full-import better,
so that we dont have such problems !?!?!?!? an import with PHP takes tooooo
long for us !

thats the query: 
query="select *
                FROM items_de.shop_items as i, shops as s 
	WHERE s.id=i.shop_id AND s.is_active=1 AND s.is_testmode=0 AND parent_id IS
NULL"  >

AND the Mappings for the categories:
<entity name="item_category" pk="id, shop_item_id" dataSource="items"
 query="select shop_category_id, order_index FROM
shop_item_category_mappings WHERE     shop_item_id='${item.id}'" >
			   			
<field column="shop_category_id" name="shop_category_id" />
<field column="order_index" 	 name="popularity"/>
			
<entity name="categoryName" pk="id" dataSource="items"
	query="select name, path from shop_categories where id =
'${item_category.shop_category_id}'" >
			
<field column="name" name="category" />
				
</entity>

what do you thing make it better ? can the dih use other options ? make it
sense to use anoter batchSize = "-1" ????

-- 
View this message in context: http://lucene.472066.n3.nabble.com/DIH-Full-Import-DB-and-Performance-tp861068p861068.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH, Full-Import, DB and Performance.

Posted by stockii <st...@shopgate.com>.
another question ....

we have sometimes a load from over 3,.. on our server and only from
different tomcat instances. no import is running and not much requests send
to solr. 

we have 4 cores running for our search. 
2 cores have each 4 Million doc's and the other two cores have each around
200.000 doc's.

why is the load so much ????
-- 
View this message in context: http://lucene.472066.n3.nabble.com/DIH-Full-Import-DB-and-Performance-tp861068p861262.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DIH, Full-Import, DB and Performance.

Posted by stockii <st...@shopgate.com>.
my batchSize is -1 and the load ist to big for us. why i should increase it ? 

what is a normal serverload ? our server is a fast server. 4 cores 3 GB Ram
.... but we dont want a serverload from over 2 when index a starts.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/DIH-Full-Import-DB-and-Performance-tp861068p864297.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DIH, Full-Import, DB and Performance.

Posted by cb...@job.com.
Performance is dependent on your server/data and the batchsize. To reduce
the server load experiment with different batchsize settings. The higher the
batch size the faster the import and the higher your SQL Server load will
be. Try starting with a small batch and then gradually increasing it.

Colin.

> -----Original Message-----
> From: stockii [mailto:stock@shopgate.com]
> Sent: Tuesday, June 01, 2010 12:31 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DIH, Full-Import, DB and Performance.
> 
> 
> thx for the reply =)
> 
> 
> i try out selectMethod="cursor"  but the load of the server is going
> bigger
> and bigger during a import =(
> 
> selectMethod="cursor" only solve the problem with the locking ? right ?
> --
> View this message in context: http://lucene.472066.n3.nabble.com/DIH-
> Full-Import-DB-and-Performance-tp861068p862043.html
> Sent from the Solr - User mailing list archive at Nabble.com.




RE: DIH, Full-Import, DB and Performance.

Posted by stockii <st...@shopgate.com>.
thx for the reply =)


i try out selectMethod="cursor"  but the load of the server is going bigger
and bigger during a import =(

selectMethod="cursor" only solve the problem with the locking ? right ? 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/DIH-Full-Import-DB-and-Performance-tp861068p862043.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DIH, Full-Import, DB and Performance.

Posted by cb...@job.com.
The settings and defaults will depend on which version of SQL Server you are
using and which version of the JDBC driver.

The default for resonseBuffering was changed to adaptive after version 1.2
so unless you are using 1.2 or earlier you don't need to set it to adaptive.

Also if I remember correctly the batchsize will only take affect if you are
using cursors, the default is for all data to be sent to the client
(selectMethod is direct).

Using the default settings for the MS sqljdbc driver caused locking issues
in our database. As soon as the full import started shared locks would be
set on all rows and wouldn't be removed until all the data had been sent,
which for us would be around 30 minutes. During that time no updates could
get an exclusive lock which of course led to huge problems.

Setting selectMethod="cursor" solved the problem for us although it does
slow down the full import.

Another option that worked for us was to not set the selectMethod and set
readOnly="true", but be sure you understand the implications. This causes
all data to be sent to the client (which is the default), giving maximum
performance, and causes no locks to be set which resolves the other issues.
However, this sets transaction isolation to TRANSACTION_READ_UNCOMMITTED
which will cause the select statement to ignore any locks when getting data
so the consistency of the data cannot be guaranteed, which may or may not be
an issue depending on your particular situation.


Colin.

> -----Original Message-----
> From: stockii [mailto:stock@shopgate.com]
> Sent: Tuesday, June 01, 2010 7:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH, Full-Import, DB and Performance.
> 
> 
> do you think that the option
> 
> responseBuffer="adaptive"
> 
> should solve my problem ?
> 
> 
> From DIH FAQ ...:
> 
> I'm using DataImportHandler with MS SQL Server database with sqljdbc
> driver.
> DataImportHandler is going out of memory. I tried adjustng the
> batchSize
> values but they don't seem to make any difference. How do I fix this?
> 
> There's a connection property called responseBuffering in the sqljdbc
> driver
> whose default value is "full" which causes the entire result set to be
> fetched. See http://msdn.microsoft.com/en-us/library/ms378988.aspx for
> more
> details. You can set this property to "adaptive" to keep the driver
> from
> getting everything into memory. Connection properties like this can be
> set
> as an attribute (responseBuffering="adaptive") in the dataSource
> configuration OR directly in the jdbc url specified in
> DataImportHandler's
> dataSource configuration.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/DIH-
> Full-Import-DB-and-Performance-tp861068p861134.html
> Sent from the Solr - User mailing list archive at Nabble.com.




Re: DIH, Full-Import, DB and Performance.

Posted by stockii <st...@shopgate.com>.
do you think that the option 

responseBuffer="adaptive" 

should solve my problem ? 


>From DIH FAQ ...: 

I'm using DataImportHandler with MS SQL Server database with sqljdbc driver.
DataImportHandler is going out of memory. I tried adjustng the batchSize
values but they don't seem to make any difference. How do I fix this?

There's a connection property called responseBuffering in the sqljdbc driver
whose default value is "full" which causes the entire result set to be
fetched. See http://msdn.microsoft.com/en-us/library/ms378988.aspx for more
details. You can set this property to "adaptive" to keep the driver from
getting everything into memory. Connection properties like this can be set
as an attribute (responseBuffering="adaptive") in the dataSource
configuration OR directly in the jdbc url specified in DataImportHandler's
dataSource configuration.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/DIH-Full-Import-DB-and-Performance-tp861068p861134.html
Sent from the Solr - User mailing list archive at Nabble.com.