You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by AlexxelA <al...@canoe.ca> on 2009/03/23 17:00:53 UTC

Delta import

I'm using the delta-import command.

Here's the deltaQuery and deltaImportQuery i use : 

select uid from profil_view where last_modified >
'${dataimporter.last_index_time}'
select * from profil_view where uid='${dataimporter.delta.uid}

When i look at the delta import status i see that the total request to
datasource equal the number of modification i had.  Is it possible to make
only one request to database and fetch all modification ?

select * from profil_view where uid in ('${dataimporter.delta.ALLuid}')
(something like that).
-- 
View this message in context: http://www.nabble.com/Delta-import-tp22663196p22663196.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta import

Posted by AlexxelA <al...@canoe.ca>.
I found what was the prob. I was using a mysql view and it seems it don't
take in consideration the index i had on the last_modified field from the
original table ><.  Mysql calls were taking 1 sec each :|

I just switch back to a request with join instead of a request to my view.

Now doing around 400 updates / sec instead of 1 update / sec :)

Thanks


Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> Hi Alex , you may be able to use CachedSqlEntityprocessor. you can do
> delta-import using  full-import
> http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta
> 
> the inner entity can use a CachedSqlEntityProcessor
> 
> On Thu, Mar 26, 2009 at 1:45 AM, AlexxelA <al...@canoe.ca>
> wrote:
>>
>> Yes my database is remote, mysql 5 and i'm using connector/J 5.1.7.  My
>> index
>> has 20000 documents.  When i try to do lets say 14 updates it takes about
>> 18
>> sec total.  Here's the resulting log of the operation :
>>
>> 2009-03-25 15:53:57 org.apache.solr.handler.dataimport.JdbcDataSource$1
>> call
>> INFO: Time taken for getConnection(): 411
>> 2009-03-25 15:53:59 org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed ModifiedRowKey for Entity: profil rows obtained : 14
>> 2009-03-25 15:53:59 org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed DeletedRowKey for Entity: profil rows obtained : 0
>> 2009-03-25 15:53:59 org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>> INFO: Completed parentDeltaQuery for Entity: profil
>> 2009-03-25 15:54:00 org.apache.solr.core.SolrDeletionPolicy onInit
>> INFO: SolrDeletionPolicy.onInit: commits:num=1
>>
>> commit{dir=/home/solr-tomcat/solr/data/index,segFN=segments_sb,version=1237322897338,generation=1019,filenames=[_uj.frq,
>> _uj.fdx, _uj.tii, _uj.nrm, _uj.tis, _uj.fnm, _uj.prx, segments_sb,
>> _uj.fdt]
>> 2009-03-25 15:54:00 org.apache.solr.core.SolrDeletionPolicy updateCommits
>> INFO: last commit = 1237322897338
>> 2009-03-25 15:54:13 org.apache.solr.handler.dataimport.DocBuilder doDelta
>> INFO: Delta Import completed successfully BOTTLE NECK
>> 2009-03-25 15:54:13 org.apache.solr.handler.dataimport.DocBuilder commit
>> INFO: Full Import completed successfully
>> 2009-03-25 15:54:13 org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)
>> 2009-03-25 15:54:15 org.apache.solr.core.SolrDeletionPolicy onCommit
>> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>>
>> commit{dir=/home/solr-tomcat/solr/data/index,segFN=segments_sb,version=1237322897338,generation=1019,filenames=[_uj.frq,
>> _uj.fdx, _uj.tii, _uj.nrm, _uj.tis, _uj.fnm, _uj.prx, segments_sb,
>> _uj.fdt]
>>
>> commit{dir=/home/solr-tomcat/solr/data/index,segFN=segments_sc,version=1237322897339,generation=1020,filenames=[_ul.prx,
>> _ul.fnm, _ul.tii, _ul.fdt, _ul.nrm, _ul.fdx, _ul.tis, _ul.frq,
>> segments_sc]
>> 2009-03-25 15:54:15 org.apache.solr.core.SolrDeletionPolicy updateCommits
>> INFO: last commit = 1237322897339
>> 2009-03-25 15:54:15 org.apache.solr.search.SolrIndexSearcher <init>
>> INFO: Opening Searcher@3da850 main
>>
>> When i do a full-import it is much faster. Take about 1 min to index
>> 20000
>> documents.  I tried to play a bit with the config but nothing seems to
>> work
>> for the moment.
>>
>> What i want to do is pretty interactive, my production db has 1,2M
>> documents
>> and must be able to delta-import around 2k update every 5min.  Is it
>> possible with the dataimporthandle to reach those kinda of number ?
>>
>>
>>
>> Shalin Shekhar Mangar wrote:
>>>
>>> On Wed, Mar 25, 2009 at 2:25 AM, AlexxelA
>>> <al...@canoe.ca>wrote:
>>>
>>>>
>>>> Ok i'm ok with the fact the solr gonna do X request to database for X
>>>> update.. but when i try to run the delta-import command with 20000 row
>>>> to
>>>> update is it normal that its kinda really slow ~ 1 document fetched /
>>>> sec
>>>> ?
>>>>
>>>>
>>> Not really, I've seen 1000x faster. Try firing a few of those queries on
>>> the
>>> database directly. Are they slow? Is the database remote?
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Delta-import-tp22663196p22710222.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: http://www.nabble.com/Delta-import-tp22663196p22727243.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta import

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
Hi Alex , you may be able to use CachedSqlEntityprocessor. you can do
delta-import using  full-import
http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta

the inner entity can use a CachedSqlEntityProcessor

On Thu, Mar 26, 2009 at 1:45 AM, AlexxelA <al...@canoe.ca> wrote:
>
> Yes my database is remote, mysql 5 and i'm using connector/J 5.1.7.  My index
> has 20000 documents.  When i try to do lets say 14 updates it takes about 18
> sec total.  Here's the resulting log of the operation :
>
> 2009-03-25 15:53:57 org.apache.solr.handler.dataimport.JdbcDataSource$1 call
> INFO: Time taken for getConnection(): 411
> 2009-03-25 15:53:59 org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: profil rows obtained : 14
> 2009-03-25 15:53:59 org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: profil rows obtained : 0
> 2009-03-25 15:53:59 org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: profil
> 2009-03-25 15:54:00 org.apache.solr.core.SolrDeletionPolicy onInit
> INFO: SolrDeletionPolicy.onInit: commits:num=1
>
> commit{dir=/home/solr-tomcat/solr/data/index,segFN=segments_sb,version=1237322897338,generation=1019,filenames=[_uj.frq,
> _uj.fdx, _uj.tii, _uj.nrm, _uj.tis, _uj.fnm, _uj.prx, segments_sb, _uj.fdt]
> 2009-03-25 15:54:00 org.apache.solr.core.SolrDeletionPolicy updateCommits
> INFO: last commit = 1237322897338
> 2009-03-25 15:54:13 org.apache.solr.handler.dataimport.DocBuilder doDelta
> INFO: Delta Import completed successfully BOTTLE NECK
> 2009-03-25 15:54:13 org.apache.solr.handler.dataimport.DocBuilder commit
> INFO: Full Import completed successfully
> 2009-03-25 15:54:13 org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)
> 2009-03-25 15:54:15 org.apache.solr.core.SolrDeletionPolicy onCommit
> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>
> commit{dir=/home/solr-tomcat/solr/data/index,segFN=segments_sb,version=1237322897338,generation=1019,filenames=[_uj.frq,
> _uj.fdx, _uj.tii, _uj.nrm, _uj.tis, _uj.fnm, _uj.prx, segments_sb, _uj.fdt]
>
> commit{dir=/home/solr-tomcat/solr/data/index,segFN=segments_sc,version=1237322897339,generation=1020,filenames=[_ul.prx,
> _ul.fnm, _ul.tii, _ul.fdt, _ul.nrm, _ul.fdx, _ul.tis, _ul.frq, segments_sc]
> 2009-03-25 15:54:15 org.apache.solr.core.SolrDeletionPolicy updateCommits
> INFO: last commit = 1237322897339
> 2009-03-25 15:54:15 org.apache.solr.search.SolrIndexSearcher <init>
> INFO: Opening Searcher@3da850 main
>
> When i do a full-import it is much faster. Take about 1 min to index 20000
> documents.  I tried to play a bit with the config but nothing seems to work
> for the moment.
>
> What i want to do is pretty interactive, my production db has 1,2M documents
> and must be able to delta-import around 2k update every 5min.  Is it
> possible with the dataimporthandle to reach those kinda of number ?
>
>
>
> Shalin Shekhar Mangar wrote:
>>
>> On Wed, Mar 25, 2009 at 2:25 AM, AlexxelA
>> <al...@canoe.ca>wrote:
>>
>>>
>>> Ok i'm ok with the fact the solr gonna do X request to database for X
>>> update.. but when i try to run the delta-import command with 20000 row to
>>> update is it normal that its kinda really slow ~ 1 document fetched / sec
>>> ?
>>>
>>>
>> Not really, I've seen 1000x faster. Try firing a few of those queries on
>> the
>> database directly. Are they slow? Is the database remote?
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Delta-import-tp22663196p22710222.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: Delta import

Posted by AlexxelA <al...@canoe.ca>.
Yes my database is remote, mysql 5 and i'm using connector/J 5.1.7.  My index
has 20000 documents.  When i try to do lets say 14 updates it takes about 18
sec total.  Here's the resulting log of the operation : 

2009-03-25 15:53:57 org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 411
2009-03-25 15:53:59 org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: profil rows obtained : 14
2009-03-25 15:53:59 org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: profil rows obtained : 0
2009-03-25 15:53:59 org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: profil
2009-03-25 15:54:00 org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
       
commit{dir=/home/solr-tomcat/solr/data/index,segFN=segments_sb,version=1237322897338,generation=1019,filenames=[_uj.frq,
_uj.fdx, _uj.tii, _uj.nrm, _uj.tis, _uj.fnm, _uj.prx, segments_sb, _uj.fdt]
2009-03-25 15:54:00 org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: last commit = 1237322897338
2009-03-25 15:54:13 org.apache.solr.handler.dataimport.DocBuilder doDelta
INFO: Delta Import completed successfully BOTTLE NECK
2009-03-25 15:54:13 org.apache.solr.handler.dataimport.DocBuilder commit
INFO: Full Import completed successfully
2009-03-25 15:54:13 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)
2009-03-25 15:54:15 org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2
       
commit{dir=/home/solr-tomcat/solr/data/index,segFN=segments_sb,version=1237322897338,generation=1019,filenames=[_uj.frq,
_uj.fdx, _uj.tii, _uj.nrm, _uj.tis, _uj.fnm, _uj.prx, segments_sb, _uj.fdt]
       
commit{dir=/home/solr-tomcat/solr/data/index,segFN=segments_sc,version=1237322897339,generation=1020,filenames=[_ul.prx,
_ul.fnm, _ul.tii, _ul.fdt, _ul.nrm, _ul.fdx, _ul.tis, _ul.frq, segments_sc]
2009-03-25 15:54:15 org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: last commit = 1237322897339
2009-03-25 15:54:15 org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@3da850 main

When i do a full-import it is much faster. Take about 1 min to index 20000
documents.  I tried to play a bit with the config but nothing seems to work
for the moment.

What i want to do is pretty interactive, my production db has 1,2M documents
and must be able to delta-import around 2k update every 5min.  Is it
possible with the dataimporthandle to reach those kinda of number ?



Shalin Shekhar Mangar wrote:
> 
> On Wed, Mar 25, 2009 at 2:25 AM, AlexxelA
> <al...@canoe.ca>wrote:
> 
>>
>> Ok i'm ok with the fact the solr gonna do X request to database for X
>> update.. but when i try to run the delta-import command with 20000 row to
>> update is it normal that its kinda really slow ~ 1 document fetched / sec
>> ?
>>
>>
> Not really, I've seen 1000x faster. Try firing a few of those queries on
> the
> database directly. Are they slow? Is the database remote?
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: http://www.nabble.com/Delta-import-tp22663196p22710222.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta import

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Mar 25, 2009 at 2:25 AM, AlexxelA <al...@canoe.ca>wrote:

>
> Ok i'm ok with the fact the solr gonna do X request to database for X
> update.. but when i try to run the delta-import command with 20000 row to
> update is it normal that its kinda really slow ~ 1 document fetched / sec ?
>
>
Not really, I've seen 1000x faster. Try firing a few of those queries on the
database directly. Are they slow? Is the database remote?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Delta import

Posted by AlexxelA <al...@canoe.ca>.
Ok i'm ok with the fact the solr gonna do X request to database for X
update.. but when i try to run the delta-import command with 20000 row to
update is it normal that its kinda really slow ~ 1 document fetched / sec ?



Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> not possible really,
> 
> that may not be useful to a lot of users because there may be too many
> changed ids and the 'IN' part can be really long.
> 
> You can raise an issue anyway
> 
> 
> 
> On Mon, Mar 23, 2009 at 9:30 PM, AlexxelA <al...@canoe.ca>
> wrote:
>>
>> I'm using the delta-import command.
>>
>> Here's the deltaQuery and deltaImportQuery i use :
>>
>> select uid from profil_view where last_modified >
>> '${dataimporter.last_index_time}'
>> select * from profil_view where uid='${dataimporter.delta.uid}
>>
>> When i look at the delta import status i see that the total request to
>> datasource equal the number of modification i had.  Is it possible to
>> make
>> only one request to database and fetch all modification ?
>>
>> select * from profil_view where uid in ('${dataimporter.delta.ALLuid}')
>> (something like that).
>> --
>> View this message in context:
>> http://www.nabble.com/Delta-import-tp22663196p22663196.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: http://www.nabble.com/Delta-import-tp22663196p22689588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta import

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
not possible really,

that may not be useful to a lot of users because there may be too many
changed ids and the 'IN' part can be really long.

You can raise an issue anyway



On Mon, Mar 23, 2009 at 9:30 PM, AlexxelA <al...@canoe.ca> wrote:
>
> I'm using the delta-import command.
>
> Here's the deltaQuery and deltaImportQuery i use :
>
> select uid from profil_view where last_modified >
> '${dataimporter.last_index_time}'
> select * from profil_view where uid='${dataimporter.delta.uid}
>
> When i look at the delta import status i see that the total request to
> datasource equal the number of modification i had.  Is it possible to make
> only one request to database and fetch all modification ?
>
> select * from profil_view where uid in ('${dataimporter.delta.ALLuid}')
> (something like that).
> --
> View this message in context: http://www.nabble.com/Delta-import-tp22663196p22663196.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul