You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by uwe72 <uw...@exxcellent.de> on 2012/11/19 20:54:56 UTC

Inserting many documents and update relations

Hi there,

i have a principal question.

We have arround 5 million lucene documents. 

At the beginning we have arround 4000 XML-files which we transform to
SolrInputDocuemnts by using solrj and adding them to the index.

A document is also related to other documents, so while adding a document we
have to do some queries (at least one) to identiy if there are related
documents already in the cache in order to do the association to the related
document. The related document also has a "backlink", so we have to update
also the related document (means load, update, delete and re-add).

We are using solr 3.6.1.

The performance is quite slow because of this queries and modfifications of
already existing documents in the cache.

Are there some configuration issues what we can do, or anything else?

Thanks a lot in advance.





--
View this message in context: http://lucene.472066.n3.nabble.com/Inserting-many-documents-and-update-relations-tp4021151.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Inserting many documents and update relations

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello,
I propose to join docs externally eg in tiny rdbms, just put ids there and
keep content in files. Then DIH, I believe and only believe, should be able
to build full document representation with joined entities.

As an alternative you can index document as is with id-references between
them in separate solr core, then index joined docs into another core by
DIH's SolrEntityProcessor querying the first core in with
http://wiki.apache.org/solr/Join .

19.11.2012 23:55 пользователь "uwe72"
<uw...@exxcellent.de>>
написал:

> Hi there,
>
> i have a principal question.
>
> We have arround 5 million lucene documents.
>
> At the beginning we have arround 4000 XML-files which we transform to
> SolrInputDocuemnts by using solrj and adding them to the index.
>
> A document is also related to other documents, so while adding a document
> we
> have to do some queries (at least one) to identiy if there are related
> documents already in the cache in order to do the association to the
> related
> document. The related document also has a "backlink", so we have to update
> also the related document (means load, update, delete and re-add).
>
> We are using solr 3.6.1.
>
> The performance is quite slow because of this queries and modfifications of
> already existing documents in the cache.
>
> Are there some configuration issues what we can do, or anything else?
>
> Thanks a lot in advance.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Inserting-many-documents-and-update-relations-tp4021151.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>