You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by manju16832003 <ma...@gmail.com> on 2014/02/12 11:09:31 UTC

Indexing strategies?

Hi,
I'm facing a dilemma of choosing the indexing strategies.
My application architecture is 
 - I have a listing table in my DB
 - For each listing, I have 3 calls to a URL Datasource of different system

 I have 200k records

 Time taken to index 25 docs is 1Minute, so for 200k it might take more than
100hrs :-(?

 
 I know there are lot of factors to consider from Network to DB.
I'm looking for different strategies that we could perform index.

 - Can we run multiple data import handlers? one data-config for first 100k
and second one is for another 100k
 - Would it be possible to write java service using SolrJ and perform
multi-threaded calls to Solr to Index?
 - The URL Datasources i'm using is actually resided in MSSQL database of
different system. Could I be able to fasten indexing time if I just could
use JDBCDataSource that calls DB directly instead through API URL data
source?

Is there any other strategies we could use?

Thank you,
 



--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing strategies?

Posted by manju16832003 <ma...@gmail.com>.

Hi Erick,
Thank you very much, those are valuable suggestions :-).
I would give a try.
Appreciate your time.



--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852p4117050.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing strategies?

Posted by Erick Erickson <er...@gmail.com>.

I'd seriously consider a SolrJ program that pulled the necessary data from
two of your systems, held it in cache and then pulled the data from your
main system and enriched it with the cached data.

Or export your information from your remote systems and import them into
a single system where you could do joins.

I believe DIH has some caching ability too that you might consider.

Your basic problem is an inefficient data model where you have to query
these different systems on a row-by-row system, that's where I'd concentrate
my energies..

Best,
Erick


On Wed, Feb 12, 2014 at 2:09 AM, manju16832003 <ma...@gmail.com>wrote:

> Hi,
> I'm facing a dilemma of choosing the indexing strategies.
> My application architecture is
>  - I have a listing table in my DB
>  - For each listing, I have 3 calls to a URL Datasource of different system
>
>  I have 200k records
>
>  Time taken to index 25 docs is 1Minute, so for 200k it might take more
> than
> 100hrs :-(?
>
>
>  I know there are lot of factors to consider from Network to DB.
> I'm looking for different strategies that we could perform index.
>
>  - Can we run multiple data import handlers? one data-config for first 100k
> and second one is for another 100k
>  - Would it be possible to write java service using SolrJ and perform
> multi-threaded calls to Solr to Index?
>  - The URL Datasources i'm using is actually resided in MSSQL database of
> different system. Could I be able to fasten indexing time if I just could
> use JDBCDataSource that calls DB directly instead through API URL data
> source?
>
> Is there any other strategies we could use?
>
> Thank you,
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>