You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Alucard <al...@gmail.com> on 2011/06/13 13:13:43 UTC

How to optimize DIH Full import if I am using SQLite and have 70,000, 700,000 or 7 million records?

Hi all

As far as I know, by using DIH it will read all the documents from database
(I am using SQLite v3) to memory.

Now I would like to ask if I have a lot of records (let say 7 millions), it
will put
All 7 millions record in memory, how can I avoid that?

There is a piece of documentation that say: setting
responseBuffering="adaptive"(MSSQL)
Or setting batchsize=”-1” (MySQL), but there is no attributes for SQLite.
Can
We use those parameters?  What other parameters can SQLite users use?

Thank you in advance.

Ellery

Re: How to optimize DIH Full import if I am using SQLite and have 70,000, 700,000 or 7 million records?

Posted by alucard001 <al...@gmail.com>.

Thank you Shalin.

But when I google "sqlite jdbc driver documentation", there is not so many pages that describe such kind of parameters.

The most relevant one is this: http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC

Can you please tell me which resources I can reference to?

Thank you again.

Ellery

Shalin Shekhar Mangar <sh...@gmail.com> 於 2011年6月13日 20:43 寫道：

> On Mon, Jun 13, 2011 at 4:43 PM, Alucard <al...@gmail.com> wrote:
> 
>> 
>> As far as I know, by using DIH it will read all the documents from database
>> (I am using SQLite v3) to memory.
> 
> 
> That is incorrect. DIH does not read rows into memory, rather, only the set
> of rows needed to create a Solr document is kept in memory at any given
> time. The documents are streamed.
> 
> 
>> 
>> Now I would like to ask if I have a lot of records (let say 7 millions), it
>> will put
>> All 7 millions record in memory, how can I avoid that?
>> 
>> There is a piece of documentation that say: setting
>> responseBuffering="adaptive"(MSSQL)
>> Or setting batchsize=”-1” (MySQL), but there is no attributes for SQLite.
>> Can
>> We use those parameters?  What other parameters can SQLite users use?
>> 
>> 
> Those parameters are JDBC driver specific settings e.g. MySQL JDBC driver
> reads rows into memory unless you set batchSize="-1"
> 
> You'll have to look at SQLite's jdbc driver's docs to see if it reads rows
> into memory or has a switch to stream rows one at a time to the client.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: How to optimize DIH Full import if I am using SQLite and have 70,000, 700,000 or 7 million records?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Mon, Jun 13, 2011 at 4:43 PM, Alucard <al...@gmail.com> wrote:

>
> As far as I know, by using DIH it will read all the documents from database
> (I am using SQLite v3) to memory.

That is incorrect. DIH does not read rows into memory, rather, only the set
of rows needed to create a Solr document is kept in memory at any given
time. The documents are streamed.

>
> Now I would like to ask if I have a lot of records (let say 7 millions), it
> will put
> All 7 millions record in memory, how can I avoid that?
>
> There is a piece of documentation that say: setting
> responseBuffering="adaptive"(MSSQL)
> Or setting batchsize=”-1” (MySQL), but there is no attributes for SQLite.
> Can
> We use those parameters?  What other parameters can SQLite users use?
>
>
Those parameters are JDBC driver specific settings e.g. MySQL JDBC driver
reads rows into memory unless you set batchSize="-1"

You'll have to look at SQLite's jdbc driver's docs to see if it reads rows
into memory or has a switch to stream rows one at a time to the client.

-- 
Regards,
Shalin Shekhar Mangar.