You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alessandro Benedetti <ab...@apache.org> on 2015/10/09 17:44:52 UTC

[SolrJ] Indexing Java Map into Solr

Hi guys,
I was evaluating an Indexer application.
This application takes in input a Collection of Objects that are basically
Java Maps.
This is for covering Solr side a big group of dynamic fields basically and
avoid that complexity java side.

Let's go to the point, currently the indexing approach is trough the Json
IndexUpdateHandler . It means that each object is now serialised in Json
and sent in a batch to the Index handler.

If possible I would like to speed up this, moving to javabin indexing ( as
discussed previously).
But apparently is slower ( I guess because of the conversion between the
Map object and the SolrDocument that need to happen before the indexing and
that is not straightforward for business logic) .

Is there any better way to index java maps, apart simply converting it to
SolrInputDocuments on the fly before sending it to the SolrClient ?

Cheers

-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: [SolrJ] Indexing Java Map into Solr

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, what does the code look like for Java? One of the cardinal sins
of indexing with SolrJ is sending docs one at a time rather than as
batches of at least 100 (I usually use 1,000). See:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/

One technique I often use to chase this kind of thing down:
comment out just the
server.add()
call. That determines whether the time is spent acquiring the docs
or actually sending them to Solr, at least that tells you where to start
looking.

If commenting that out substantially speeds up your throughput _and_
you're batching, then check the CPU utilization. If it's not very high, you
can add a bunch more clients/ threads

Bottom line: I'm doubtful that parsing your input is all that expensive, but
what do I know? Until you can actually pinpoint where the time is being
spent it's all guesswork, so a profiler seems in order.

Best,
Erick

On Fri, Oct 9, 2015 at 8:44 AM, Alessandro Benedetti
<ab...@apache.org> wrote:
> Hi guys,
> I was evaluating an Indexer application.
> This application takes in input a Collection of Objects that are basically
> Java Maps.
> This is for covering Solr side a big group of dynamic fields basically and
> avoid that complexity java side.
>
> Let's go to the point, currently the indexing approach is trough the Json
> IndexUpdateHandler . It means that each object is now serialised in Json
> and sent in a batch to the Index handler.
>
> If possible I would like to speed up this, moving to javabin indexing ( as
> discussed previously).
> But apparently is slower ( I guess because of the conversion between the
> Map object and the SolrDocument that need to happen before the indexing and
> that is not straightforward for business logic) .
>
> Is there any better way to index java maps, apart simply converting it to
> SolrInputDocuments on the fly before sending it to the SolrClient ?
>
> Cheers
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England