You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Israel Ekpo <is...@gmail.com> on 2009/10/30 18:04:56 UTC

Re: adding and updating a lot of document to Solr, metadata extraction etc

On Fri, Oct 30, 2009 at 11:23 AM, Eugene Dzhurinsky <bo...@redwerk.com>wrote:

> Hi there!
>
> We are trying to evaluate Apache Solr for our custom search implementation,
> which
> includes the following requirements:
>
> - ability to add/update/delete a lot of documents at once
>
> - ability to iterate over all documents, returned in search, as Lucene does
>  provide within a HitCollector instance. We would need to extract and
>  aggregate various fields, stored in index, to group results and aggregate
> them
>  in some way.
>
> After reading the tutorial I've realized that adding and removal of
> documents
> is performed through passing an XML file to controller in POST request.
> However our XML files may be very, very large - so I hope there is some
> another option to avoid interaction through HTTP protocol.
>
> Also I did not find any way in the tutorial to access the search results
> with
> all fields to be processed by our application.
>
> I think I simply did not read the documentation well or missed some point,
> so
> can somebody please point me to the articles, which may explain basics of
> how
> to achieve my goals?
>
> Thank you very much in advance!
>
> --
> Eugene N Dzhurinsky
>

Hi Eugene

Solr has an embedded version but you are encouraged to use the standard web
service interfaces.

Also, the Solr 1.4 white paper just recently released talks about the the
Streaming Updates Solr Server which according to the white paper can index
documents at an incredibly lightening speed of up to 25K documents per
second.

The white paper can be downloaded here

http://www.lucidimagination.com/whitepaper/whats-new-in-solr-1-4

Info about Streaming Update Solr Server is available here

http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html

If you are still interested in the Embedded version to avoid the HTTP
version you can check out the following links

http://wiki.apache.org/solr/EmbeddedSolr

http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html

I hope this helps.

-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.