You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by 郭芸 <mi...@gmail.com> on 2010/09/07 03:53:43 UTC

About Solr DataImportHandler

Dear all:
I use Solr DataImportHandler's JdbcDataSource to import the Sqlsever 2005's datas to Solr,but My table is versy big,about 300G.and i found that Solr import the datas to memory first,then write them to index dir.So if the datas are too big,there will trigger an OutOfMemoryException.
I want to solve this problem,and how can ti do it?anybody can help me?Thank you.

2010-09-07 



郭芸 

Re: Re: Re: About Solr DataImportHandler

Posted by 郭芸 <mi...@gmail.com>.
Thank you very much! I already know how to resolve my problem by your help!
And i have registed in solr-user@lucene.apache.org...
and I hope that someday  i can join this Open Source community to do something!

2010-09-08 



郭芸 



发件人: Alexey Serba 
发送时间: 2010-09-08  13:51:18 
收件人: dev 
抄送: 
主题: Re: Re: About Solr DataImportHandler 
 
> 2.But there are some problems:
> if the table is very big,solr will spend a long time to import and index,may
> be one day and more.so once occurred network problems and others during this
> time,maybe solr can not remember what documents had processed,and if we
> continue data import ,we do not know where to start.
You can _batch_ import your data using full import command by
providing additional request parameter ( see
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
), i.e.
query="SELECT * FROM my_table ORDER BY id LIMIT 1000000 OFFSET
${dataimporter.request.offset}"
and then calling full-import command several times:
1) /dataimport?clean=true&offset=0
2) /dataimport?clean=false&offset=1000000
3) /dataimport?clean=false&offset=2000000
etc
// Please use solr-user@lucene.apache.org mailing list for such
questions. _dev_ is not appropriate place for this.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Re: About Solr DataImportHandler

Posted by Alexey Serba <as...@gmail.com>.
> 2.But there are some problems:
> if the table is very big,solr will spend a long time to import and index,may
> be one day and more.so once occurred network problems and others during this
> time,maybe solr can not remember what documents had processed,and if we
> continue data import ,we do not know where to start.

You can _batch_ import your data using full import command by
providing additional request parameter ( see
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
), i.e.

query="SELECT * FROM my_table ORDER BY id LIMIT 1000000 OFFSET
${dataimporter.request.offset}"

and then calling full-import command several times:
1) /dataimport?clean=true&offset=0
2) /dataimport?clean=false&offset=1000000
3) /dataimport?clean=false&offset=2000000
etc

// Please use solr-user@lucene.apache.org mailing list for such
questions. _dev_ is not appropriate place for this.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Re: About Solr DataImportHandler

Posted by 郭芸 <mi...@gmail.com>.
Thank you for your reply,it is very import to me.
1.I  agree with you by i read solr's source code,i found that it can resolve this problem by config db-data-config.xml,like this(my database's Sqlserver2005,other database will unavailable):
<dataSource name="dsSqlServer" type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" batchSize="3000"
        url="jdbc:sqlserver://192.168.1.5:1433; DatabaseName=testDatabase;responseBuffering=adaptive;selectMethod=cursor" user="sa" password="12345" />

add  responseBuffering=adaptive;selectMethod=cursor into url attribute,and Solr will set these parameters by itself:
c.createStatement(ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY);
By these configs,Solr can import big table's datas to  index dir.


2.But there are some problems:
if the table is very big,solr will spend a long time to import and index,may be one day and more.so once occurred network problems and others during this time,maybe solr can not remember what documents had processed,and if we continue data import ,we do not know where to start.

3.i am sorry for my bad English.i wish you can know what i mean.


2010-09-08 



郭芸 



发件人: Alexey Serba 
发送时间: 2010-09-07  16:07:49 
收件人: dev 
抄送: 
主题: Re: About Solr DataImportHandler 
 
> i found that Solr import the datas to memory first,then write them to index dir.
That's not really true. DataImportHandler streams the result from
database query and adding documents into index. So it shouldn't load
all database data into memory. Disabling autoCommit, warming queries
and spellcheckers usually decreases required amount of memory during
indexing process.
Please share your hardware details, jvm options, solrconfig and schema
configuration, etc.
2010/9/7 郭芸 <mi...@gmail.com>:
> Dear all:
> I use Solr DataImportHandler's JdbcDataSource to import the Sqlsever 2005's
> datas to Solr,but My table is versy big,about 300G.and i found that Solr
> import the datas to memory first,then write them to index dir.So if the
> datas are too big,there will trigger an OutOfMemoryException.
> I want to solve this problem,and how can ti do it?anybody can help me?Thank
> you.
>
> 2010-09-07
> ________________________________
> 郭芸
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: About Solr DataImportHandler

Posted by Alexey Serba <as...@gmail.com>.
> i found that Solr import the datas to memory first,then write them to index dir.
That's not really true. DataImportHandler streams the result from
database query and adding documents into index. So it shouldn't load
all database data into memory. Disabling autoCommit, warming queries
and spellcheckers usually decreases required amount of memory during
indexing process.

Please share your hardware details, jvm options, solrconfig and schema
configuration, etc.



2010/9/7 郭芸 <mi...@gmail.com>:
> Dear all:
> I use Solr DataImportHandler's JdbcDataSource to import the Sqlsever 2005's
> datas to Solr,but My table is versy big,about 300G.and i found that Solr
> import the datas to memory first,then write them to index dir.So if the
> datas are too big,there will trigger an OutOfMemoryException.
> I want to solve this problem,and how can ti do it?anybody can help me?Thank
> you.
>
> 2010-09-07
> ________________________________
> 郭芸

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: About Solr DataImportHandler

Posted by Fu Shunkai 傅顺开 <fu...@peptalk.cn>.
Try to set the batchsize as -1. 


2010-09-08 



傅顺开
苏州广达友讯技术有限公司
江苏苏州工业园区金鸡湖大道1355号
国际科技园151A,215021
电话:(512)6288-8255(转612)
传真:(512)6288-8155
手机:(0)158-5018-8480
Email:fusk@peptalk.cn
http://www.bedo.cn, http://k.ai, http://www.lbs.org.cn 
 



发件人: 郭芸 
发送时间: 2010-09-07  09:55:05 
收件人: Solr Lucene 
抄送: 
主题: About Solr DataImportHandler 
 
Dear all:
I use Solr DataImportHandler's JdbcDataSource to import the Sqlsever 2005's datas to Solr,but My table is versy big,about 300G.and i found that Solr import the datas to memory first,then write them to index dir.So if the datas are too big,there will trigger an OutOfMemoryException.
I want to solve this problem,and how can ti do it?anybody can help me?Thank you.

2010-09-07 



郭芸