You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kshitij tyagi <ks...@gmail.com> on 2016/08/16 08:33:39 UTC

Indexing (posting document) taking a lot of time

Hi,

I am indexing a lot of data about 8GB, but it is taking a lot of time. I
have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in
solrconfig file.

It would be helpful if someone could help me out tune the segtting for
faster indexing speeds.

*I have read the docs but not able to get what exactly means changing these
configs.*


*Regards,*
*Kshitij*

Re: Indexing (posting document) taking a lot of time

Posted by Emir Arnautovic <em...@sematext.com>.
That is quite big document! You need to minitor Solr to see if you are 
feeding documents fast enough or if you are saturating it with large 
number of large requests. Play with batch size and number of threads to 
find sweet spot. Maybe try extremes first (one doc/one thread, one doc 
many threads etc.) and it might tell you more what is slowing things 
down. If you are not using any Solr/JVM/OS monitoring tool, it will help 
you a lot to diagnose issue. One such tool is our SPM 
(http://sematext.com/spm).

Regards,
Emir

On 16.08.2016 14:49, kshitij tyagi wrote:
> 400kb is size of single document and i am sending 100 documents per request.
> solr heap size is 16gb and running on multithread.
>
> On Tue, Aug 16, 2016 at 5:10 PM, Emir Arnautovic <
> emir.arnautovic@sematext.com> wrote:
>
>> Hi,
>>
>> 400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr
>> will be idle while accepting relatively large request. Or is 400KB 100 doc
>> bulk that you are sending?
>>
>> What is Solr's heap size? I would try increasing number of threads and
>> monitor Solr's heap/CPU/IO to see where is the bottleneck.
>>
>> How complex is fields' analysis?
>>
>> Regards,
>> Emir
>>
>>
>> On 16.08.2016 13:25, kshitij tyagi wrote:
>>
>>> hi,
>>>
>>> we are sending about 100 documents per request for indexing? we have
>>> autocmmit set to false and commit only when 10000 documents are
>>> present.solr and the machine sending request are in same pool.
>>>
>>>
>>>
>>> On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic <
>>> emir.arnautovic@sematext.com> wrote:
>>>
>>> Hi,
>>>> Do you send one doc per request? How frequently do you commit? Where is
>>>> Solr running? What is network connection between your machine and Solr?
>>>> What are JVM settings? Is 10-30s for entire indexing or single doc?
>>>>
>>>> Regards,
>>>> Emir
>>>>
>>>>
>>>> On 16.08.2016 11:34, kshitij tyagi wrote:
>>>>
>>>> Hi alexandre,
>>>>> 1 document of 400kb size is taking approx 10-30 sec and this is
>>>>> varying. I
>>>>> am posting document using curl
>>>>>
>>>>> On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch <
>>>>> arafalov@gmail.com>
>>>>> wrote:
>>>>>
>>>>> How many records is that and what is 'slow'? Also is this standalone or
>>>>>
>>>>>> cluster setup?
>>>>>>
>>>>>> On 16 Aug 2016 6:33 PM, "kshitij tyagi" <ks...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>> I am indexing a lot of data about 8GB, but it is taking a lot of
>>>>>>> time. I
>>>>>>> have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in
>>>>>>> solrconfig file.
>>>>>>>
>>>>>>> It would be helpful if someone could help me out tune the segtting for
>>>>>>> faster indexing speeds.
>>>>>>>
>>>>>>> *I have read the docs but not able to get what exactly means changing
>>>>>>>
>>>>>>> these
>>>>>> configs.*
>>>>>>>
>>>>>>> *Regards,*
>>>>>>> *Kshitij*
>>>>>>>
>>>>>>>
>>>>>>> --
>>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>
>>>>
>>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: Indexing (posting document) taking a lot of time

Posted by kshitij tyagi <ks...@gmail.com>.
I am posting json using curl.

On Wed, Aug 17, 2016 at 4:41 AM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> What format are those documents? Solr XML? Custom JSON?
>
> Or are you sending PDF/binary documents to Solr's extract handler and
> asking it to do the extraction of the useful stuff? If later, you
> could take that step out of Solr with a custom client using Tika (what
> Solr has under the hood) and only send to Solr the processed output.
>
> Regards,
>    Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 16 August 2016 at 22:49, kshitij tyagi <ks...@gmail.com>
> wrote:
> > 400kb is size of single document and i am sending 100 documents per
> request.
> > solr heap size is 16gb and running on multithread.
> >
> > On Tue, Aug 16, 2016 at 5:10 PM, Emir Arnautovic <
> > emir.arnautovic@sematext.com> wrote:
> >
> >> Hi,
> >>
> >> 400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr
> >> will be idle while accepting relatively large request. Or is 400KB 100
> doc
> >> bulk that you are sending?
> >>
> >> What is Solr's heap size? I would try increasing number of threads and
> >> monitor Solr's heap/CPU/IO to see where is the bottleneck.
> >>
> >> How complex is fields' analysis?
> >>
> >> Regards,
> >> Emir
> >>
> >>
> >> On 16.08.2016 13:25, kshitij tyagi wrote:
> >>
> >>> hi,
> >>>
> >>> we are sending about 100 documents per request for indexing? we have
> >>> autocmmit set to false and commit only when 10000 documents are
> >>> present.solr and the machine sending request are in same pool.
> >>>
> >>>
> >>>
> >>> On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic <
> >>> emir.arnautovic@sematext.com> wrote:
> >>>
> >>> Hi,
> >>>>
> >>>> Do you send one doc per request? How frequently do you commit? Where
> is
> >>>> Solr running? What is network connection between your machine and
> Solr?
> >>>> What are JVM settings? Is 10-30s for entire indexing or single doc?
> >>>>
> >>>> Regards,
> >>>> Emir
> >>>>
> >>>>
> >>>> On 16.08.2016 11:34, kshitij tyagi wrote:
> >>>>
> >>>> Hi alexandre,
> >>>>>
> >>>>> 1 document of 400kb size is taking approx 10-30 sec and this is
> >>>>> varying. I
> >>>>> am posting document using curl
> >>>>>
> >>>>> On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch <
> >>>>> arafalov@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> How many records is that and what is 'slow'? Also is this standalone
> or
> >>>>>
> >>>>>> cluster setup?
> >>>>>>
> >>>>>> On 16 Aug 2016 6:33 PM, "kshitij tyagi" <
> kshitij.shopclues@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>>> I am indexing a lot of data about 8GB, but it is taking a lot of
> >>>>>>> time. I
> >>>>>>> have read about maxBufferedDocs, ramBufferSizeMB, merge policy
> ,etc in
> >>>>>>> solrconfig file.
> >>>>>>>
> >>>>>>> It would be helpful if someone could help me out tune the segtting
> for
> >>>>>>> faster indexing speeds.
> >>>>>>>
> >>>>>>> *I have read the docs but not able to get what exactly means
> changing
> >>>>>>>
> >>>>>>> these
> >>>>>>
> >>>>>> configs.*
> >>>>>>>
> >>>>>>>
> >>>>>>> *Regards,*
> >>>>>>> *Kshitij*
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >>>> Solr & Elasticsearch Support * http://sematext.com/
> >>>>
> >>>>
> >>>>
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
>

Re: Indexing (posting document) taking a lot of time

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
What format are those documents? Solr XML? Custom JSON?

Or are you sending PDF/binary documents to Solr's extract handler and
asking it to do the extraction of the useful stuff? If later, you
could take that step out of Solr with a custom client using Tika (what
Solr has under the hood) and only send to Solr the processed output.

Regards,
   Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 16 August 2016 at 22:49, kshitij tyagi <ks...@gmail.com> wrote:
> 400kb is size of single document and i am sending 100 documents per request.
> solr heap size is 16gb and running on multithread.
>
> On Tue, Aug 16, 2016 at 5:10 PM, Emir Arnautovic <
> emir.arnautovic@sematext.com> wrote:
>
>> Hi,
>>
>> 400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr
>> will be idle while accepting relatively large request. Or is 400KB 100 doc
>> bulk that you are sending?
>>
>> What is Solr's heap size? I would try increasing number of threads and
>> monitor Solr's heap/CPU/IO to see where is the bottleneck.
>>
>> How complex is fields' analysis?
>>
>> Regards,
>> Emir
>>
>>
>> On 16.08.2016 13:25, kshitij tyagi wrote:
>>
>>> hi,
>>>
>>> we are sending about 100 documents per request for indexing? we have
>>> autocmmit set to false and commit only when 10000 documents are
>>> present.solr and the machine sending request are in same pool.
>>>
>>>
>>>
>>> On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic <
>>> emir.arnautovic@sematext.com> wrote:
>>>
>>> Hi,
>>>>
>>>> Do you send one doc per request? How frequently do you commit? Where is
>>>> Solr running? What is network connection between your machine and Solr?
>>>> What are JVM settings? Is 10-30s for entire indexing or single doc?
>>>>
>>>> Regards,
>>>> Emir
>>>>
>>>>
>>>> On 16.08.2016 11:34, kshitij tyagi wrote:
>>>>
>>>> Hi alexandre,
>>>>>
>>>>> 1 document of 400kb size is taking approx 10-30 sec and this is
>>>>> varying. I
>>>>> am posting document using curl
>>>>>
>>>>> On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch <
>>>>> arafalov@gmail.com>
>>>>> wrote:
>>>>>
>>>>> How many records is that and what is 'slow'? Also is this standalone or
>>>>>
>>>>>> cluster setup?
>>>>>>
>>>>>> On 16 Aug 2016 6:33 PM, "kshitij tyagi" <ks...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>> I am indexing a lot of data about 8GB, but it is taking a lot of
>>>>>>> time. I
>>>>>>> have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in
>>>>>>> solrconfig file.
>>>>>>>
>>>>>>> It would be helpful if someone could help me out tune the segtting for
>>>>>>> faster indexing speeds.
>>>>>>>
>>>>>>> *I have read the docs but not able to get what exactly means changing
>>>>>>>
>>>>>>> these
>>>>>>
>>>>>> configs.*
>>>>>>>
>>>>>>>
>>>>>>> *Regards,*
>>>>>>> *Kshitij*
>>>>>>>
>>>>>>>
>>>>>>> --
>>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>
>>>>
>>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>

Re: Indexing (posting document) taking a lot of time

Posted by kshitij tyagi <ks...@gmail.com>.
400kb is size of single document and i am sending 100 documents per request.
solr heap size is 16gb and running on multithread.

On Tue, Aug 16, 2016 at 5:10 PM, Emir Arnautovic <
emir.arnautovic@sematext.com> wrote:

> Hi,
>
> 400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr
> will be idle while accepting relatively large request. Or is 400KB 100 doc
> bulk that you are sending?
>
> What is Solr's heap size? I would try increasing number of threads and
> monitor Solr's heap/CPU/IO to see where is the bottleneck.
>
> How complex is fields' analysis?
>
> Regards,
> Emir
>
>
> On 16.08.2016 13:25, kshitij tyagi wrote:
>
>> hi,
>>
>> we are sending about 100 documents per request for indexing? we have
>> autocmmit set to false and commit only when 10000 documents are
>> present.solr and the machine sending request are in same pool.
>>
>>
>>
>> On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic <
>> emir.arnautovic@sematext.com> wrote:
>>
>> Hi,
>>>
>>> Do you send one doc per request? How frequently do you commit? Where is
>>> Solr running? What is network connection between your machine and Solr?
>>> What are JVM settings? Is 10-30s for entire indexing or single doc?
>>>
>>> Regards,
>>> Emir
>>>
>>>
>>> On 16.08.2016 11:34, kshitij tyagi wrote:
>>>
>>> Hi alexandre,
>>>>
>>>> 1 document of 400kb size is taking approx 10-30 sec and this is
>>>> varying. I
>>>> am posting document using curl
>>>>
>>>> On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch <
>>>> arafalov@gmail.com>
>>>> wrote:
>>>>
>>>> How many records is that and what is 'slow'? Also is this standalone or
>>>>
>>>>> cluster setup?
>>>>>
>>>>> On 16 Aug 2016 6:33 PM, "kshitij tyagi" <ks...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> I am indexing a lot of data about 8GB, but it is taking a lot of
>>>>>> time. I
>>>>>> have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in
>>>>>> solrconfig file.
>>>>>>
>>>>>> It would be helpful if someone could help me out tune the segtting for
>>>>>> faster indexing speeds.
>>>>>>
>>>>>> *I have read the docs but not able to get what exactly means changing
>>>>>>
>>>>>> these
>>>>>
>>>>> configs.*
>>>>>>
>>>>>>
>>>>>> *Regards,*
>>>>>> *Kshitij*
>>>>>>
>>>>>>
>>>>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Re: Indexing (posting document) taking a lot of time

Posted by Emir Arnautovic <em...@sematext.com>.
Hi,

400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr 
will be idle while accepting relatively large request. Or is 400KB 100 
doc bulk that you are sending?

What is Solr's heap size? I would try increasing number of threads and 
monitor Solr's heap/CPU/IO to see where is the bottleneck.

How complex is fields' analysis?

Regards,
Emir

On 16.08.2016 13:25, kshitij tyagi wrote:
> hi,
>
> we are sending about 100 documents per request for indexing? we have
> autocmmit set to false and commit only when 10000 documents are
> present.solr and the machine sending request are in same pool.
>
>
>
> On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic <
> emir.arnautovic@sematext.com> wrote:
>
>> Hi,
>>
>> Do you send one doc per request? How frequently do you commit? Where is
>> Solr running? What is network connection between your machine and Solr?
>> What are JVM settings? Is 10-30s for entire indexing or single doc?
>>
>> Regards,
>> Emir
>>
>>
>> On 16.08.2016 11:34, kshitij tyagi wrote:
>>
>>> Hi alexandre,
>>>
>>> 1 document of 400kb size is taking approx 10-30 sec and this is varying. I
>>> am posting document using curl
>>>
>>> On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch <
>>> arafalov@gmail.com>
>>> wrote:
>>>
>>> How many records is that and what is 'slow'? Also is this standalone or
>>>> cluster setup?
>>>>
>>>> On 16 Aug 2016 6:33 PM, "kshitij tyagi" <ks...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>> I am indexing a lot of data about 8GB, but it is taking a lot of time. I
>>>>> have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in
>>>>> solrconfig file.
>>>>>
>>>>> It would be helpful if someone could help me out tune the segtting for
>>>>> faster indexing speeds.
>>>>>
>>>>> *I have read the docs but not able to get what exactly means changing
>>>>>
>>>> these
>>>>
>>>>> configs.*
>>>>>
>>>>>
>>>>> *Regards,*
>>>>> *Kshitij*
>>>>>
>>>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: Indexing (posting document) taking a lot of time

Posted by kshitij tyagi <ks...@gmail.com>.
hi,

we are sending about 100 documents per request for indexing? we have
autocmmit set to false and commit only when 10000 documents are
present.solr and the machine sending request are in same pool.



On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic <
emir.arnautovic@sematext.com> wrote:

> Hi,
>
> Do you send one doc per request? How frequently do you commit? Where is
> Solr running? What is network connection between your machine and Solr?
> What are JVM settings? Is 10-30s for entire indexing or single doc?
>
> Regards,
> Emir
>
>
> On 16.08.2016 11:34, kshitij tyagi wrote:
>
>> Hi alexandre,
>>
>> 1 document of 400kb size is taking approx 10-30 sec and this is varying. I
>> am posting document using curl
>>
>> On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch <
>> arafalov@gmail.com>
>> wrote:
>>
>> How many records is that and what is 'slow'? Also is this standalone or
>>> cluster setup?
>>>
>>> On 16 Aug 2016 6:33 PM, "kshitij tyagi" <ks...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>>
>>>> I am indexing a lot of data about 8GB, but it is taking a lot of time. I
>>>> have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in
>>>> solrconfig file.
>>>>
>>>> It would be helpful if someone could help me out tune the segtting for
>>>> faster indexing speeds.
>>>>
>>>> *I have read the docs but not able to get what exactly means changing
>>>>
>>> these
>>>
>>>> configs.*
>>>>
>>>>
>>>> *Regards,*
>>>> *Kshitij*
>>>>
>>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Re: Indexing (posting document) taking a lot of time

Posted by Emir Arnautovic <em...@sematext.com>.
Hi,

Do you send one doc per request? How frequently do you commit? Where is 
Solr running? What is network connection between your machine and Solr? 
What are JVM settings? Is 10-30s for entire indexing or single doc?

Regards,
Emir

On 16.08.2016 11:34, kshitij tyagi wrote:
> Hi alexandre,
>
> 1 document of 400kb size is taking approx 10-30 sec and this is varying. I
> am posting document using curl
>
> On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
>
>> How many records is that and what is 'slow'? Also is this standalone or
>> cluster setup?
>>
>> On 16 Aug 2016 6:33 PM, "kshitij tyagi" <ks...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am indexing a lot of data about 8GB, but it is taking a lot of time. I
>>> have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in
>>> solrconfig file.
>>>
>>> It would be helpful if someone could help me out tune the segtting for
>>> faster indexing speeds.
>>>
>>> *I have read the docs but not able to get what exactly means changing
>> these
>>> configs.*
>>>
>>>
>>> *Regards,*
>>> *Kshitij*
>>>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: Indexing (posting document) taking a lot of time

Posted by kshitij tyagi <ks...@gmail.com>.
Hi alexandre,

1 document of 400kb size is taking approx 10-30 sec and this is varying. I
am posting document using curl

On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> How many records is that and what is 'slow'? Also is this standalone or
> cluster setup?
>
> On 16 Aug 2016 6:33 PM, "kshitij tyagi" <ks...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am indexing a lot of data about 8GB, but it is taking a lot of time. I
> > have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in
> > solrconfig file.
> >
> > It would be helpful if someone could help me out tune the segtting for
> > faster indexing speeds.
> >
> > *I have read the docs but not able to get what exactly means changing
> these
> > configs.*
> >
> >
> > *Regards,*
> > *Kshitij*
> >
>

Re: Indexing (posting document) taking a lot of time

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
How many records is that and what is 'slow'? Also is this standalone or
cluster setup?

On 16 Aug 2016 6:33 PM, "kshitij tyagi" <ks...@gmail.com> wrote:

> Hi,
>
> I am indexing a lot of data about 8GB, but it is taking a lot of time. I
> have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in
> solrconfig file.
>
> It would be helpful if someone could help me out tune the segtting for
> faster indexing speeds.
>
> *I have read the docs but not able to get what exactly means changing these
> configs.*
>
>
> *Regards,*
> *Kshitij*
>