You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Thompson,Roger" <th...@oclc.org> on 2007/09/14 14:19:43 UTC
Batch indexing a large number of records
Hi there!
I am embarking on re-engineering an application using Solr/Lucene (If
you'd like to see the current manifestation go to:
fictionfinder.oclc.org). The database for this application consists of
approximatly 1.4 million records of varying size for the "work" record,
and another database of 1.9 million bibliographic records. I fear that
loading this through http will take several days, perhaps a week. Do
any of you have a way to do a large batch load of the DB?
Roger Thompson
Re: Batch indexing a large number of records
Posted by Mike Klaas <mi...@gmail.com>.
On 14-Sep-07, at 5:19 AM, Thompson,Roger wrote:
> Hi there!
>
> I am embarking on re-engineering an application using Solr/Lucene (If
> you'd like to see the current manifestation go to:
> fictionfinder.oclc.org). The database for this application
> consists of
> approximatly 1.4 million records of varying size for the "work"
> record,
> and another database of 1.9 million bibliographic records. I fear
> that
> loading this through http will take several days, perhaps a week. Do
> any of you have a way to do a large batch load of the DB?
I can index 2 million web documents in 7 hours over http. Just batch
a few (10) docs per http POST, and use around N+1 threads (N=#
processors).
-Mike
Re: Batch indexing a large number of records
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 14, 2007, at 8:19 AM, Thompson,Roger wrote:
> I am embarking on re-engineering an application using Solr/Lucene (If
> you'd like to see the current manifestation go to:
> fictionfinder.oclc.org). The database for this application
> consists of
> approximatly 1.4 million records of varying size for the "work"
> record,
> and another database of 1.9 million bibliographic records. I fear
> that
> loading this through http will take several days, perhaps a week. Do
> any of you have a way to do a large batch load of the DB?
It won't take that long. Send multiple documents per POST and
perhaps commit every big bunch or so. I ingested 3.8M binary MARC
records in a pretty crude way in less than a day.
But, the fastest way to ingest data into Solr out of the box, I
think, is to use the CSV import capabilities. I've indexed 1.8M
bibliographic-sized records in 18 minutes with the CSV uploader,
pointing it to a local file.
Erik