You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Carey Sublette <cs...@local.com> on 2011/11/09 21:06:23 UTC

Importing Big Data From Berkeley DB to Solr

Hi:

I have a massive data repository (hundreds of millions of records) stored in Berkeley DB with Java code to access it, and I need an efficient method to import it into Solr for indexing. I cannot find a straightforward Java data import API that I can load the data with.

There is no JDBC for the DataImportHandler to call, it is not a simple file, and the inefficiencies (and extra code) needed to submit it as HTTP calls, or as XML feeds, etc. are measures of last resort only.

Can a call a Lucene API in a Solr installation to do this somehow?

Thanks

RE: Importing Big Data From Berkeley DB to Solr

Posted by Carey Sublette <cs...@local.com>.
Thanks Otis:

It looks like SolrJ is what I was looking for exactly, it is also nice to know that the csv implementation is fast as a fall back.

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Wednesday, November 09, 2011 12:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Importing Big Data From Berkeley DB to Solr

Carey,

Some options:
* Just read your BDB and use SolrJ to index to Solr in batches and in parallel
* Dump your BDB into csv format and use Solr's ability to import csv files fast
* Use Hadoop MapReduce to index to Lucene or Solr in parallel

Yes, you can index using Lucene APIs directly, but you will have to make sure all the analysis you specify there is identical to what you have in your Solr schema.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Carey Sublette <cs...@local.com>
>To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>Sent: Wednesday, November 9, 2011 3:06 PM
>Subject: Importing Big Data From Berkeley DB to Solr
>
>Hi:
>
>I have a massive data repository (hundreds of millions of records) stored in Berkeley DB with Java code to access it, and I need an efficient method to import it into Solr for indexing. I cannot find a straightforward Java data import API that I can load the data with.
>
>There is no JDBC for the DataImportHandler to call, it is not a simple file, and the inefficiencies (and extra code) needed to submit it as HTTP calls, or as XML feeds, etc. are measures of last resort only.
>
>Can a call a Lucene API in a Solr installation to do this somehow?
>
>Thanks
>
>
>

Re: Importing Big Data From Berkeley DB to Solr

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Carey,

Some options:
* Just read your BDB and use SolrJ to index to Solr in batches and in parallel
* Dump your BDB into csv format and use Solr's ability to import csv files fast
* Use Hadoop MapReduce to index to Lucene or Solr in parallel

Yes, you can index using Lucene APIs directly, but you will have to make sure all the analysis you specify there is identical to what you have in your Solr schema.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Carey Sublette <cs...@local.com>
>To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>Sent: Wednesday, November 9, 2011 3:06 PM
>Subject: Importing Big Data From Berkeley DB to Solr
>
>Hi:
>
>I have a massive data repository (hundreds of millions of records) stored in Berkeley DB with Java code to access it, and I need an efficient method to import it into Solr for indexing. I cannot find a straightforward Java data import API that I can load the data with.
>
>There is no JDBC for the DataImportHandler to call, it is not a simple file, and the inefficiencies (and extra code) needed to submit it as HTTP calls, or as XML feeds, etc. are measures of last resort only.
>
>Can a call a Lucene API in a Solr installation to do this somehow?
>
>Thanks
>
>
>