You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pranav Prakash <pr...@gmail.com> on 2011/10/04 15:17:34 UTC

How to achieve Indexing @ 270GiB/hr

Greetings,

While going through the article 265% indexing speedup with Lucene's
concurrent flushing<http://java.dzone.com/news/265-indexing-speedup-lucenes?mz=33057-solr_lucene>
I
was stunned by the endless possibilities in which Indexing speed could be
increased.

I'd like to take inputs from everyone over here as to how to achieve this
speed. As far as I understand there are two broad ways of feeding data to
Solr -

   1. Using DataImportHandler
   2. Using HTTP to POST docs to Solr.

The speeds at which the article describes indexing seems kinda too much to
expect using the second approach. Or is it possible using multiple instances
feeding docs to Solr?

My current setup does the following -

   1. Execute SQL queries to create database of documents that needs to be
   fed.
   2. Go through the columns one by one, and create XMLs for them and send
   it over to Solr in batches of max 500 docs.


Even if using DataImportHandler what are the ways this could be optimized?
If I am able to solve the problem of indexing data in our current setup, my
life would become a lot easier.


*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>