You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ravi Solr <ra...@gmail.com> on 2014/03/24 22:36:43 UTC

solr 4.x reindexing issues

Hello,
        We are trying to reindex as part of our move from 3.6.2 to 4.6.1
and have faced various issues reindexing 1.5 Million docs. We dont use
solrcloud, its still Master/Slave config. For testing this Iam using a
single test server reading from it and putting back into same index.

We send docs in batches of 100 but only 10/100 are getting indexed, is this
related to the maxBufferedAddsPerServer setting that is hard coded ?? Also
I tried to play with autocommit and softcommit settings but in vain.

    <autoCommit>
       <maxDocs>5</maxDocs>
       <maxTime>5000</maxTime>
       <openSearcher>true</openSearcher>
    </autoCommit>

    <autoSoftCommit>
        <maxTime>1000</maxTime>
    </autoSoftCommit>

I use these on the test system just to check if docs are being indexed, but
even with a batch of 5 my solrj client code runs faster than indexing
causing some docs to not get indexed. The function that's indexing is a
recursive method call  (shown below) which fails after sometime with stack
overflow (I did not have this issue with 3.6.2 with same code)

    private static void processDocs(HttpSolrServer server, Integer start,
Integer rows) throws Exception {
        SolrQuery query = new SolrQuery();
        query.setQuery("*:*");
        query.addFilterQuery("-allfields:[* TO *]");
        QueryResponse resp = server.query(query);
        SolrDocumentList list =  resp.getResults();
        Long total = list.getNumFound();

        if(list != null && !list.isEmpty()) {
            for(SolrDocument doc : list) {
                SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);
                //To index full doc again
                iDoc.removeField("_version_");
                server.add(iDoc, 1000);
            }

            System.out.println("Indexed " + (start+rows) + "/" + total);
            if (total >= (start + rows)) {
                processDocs(server, (start + rows), rows);
            }
        }
    }

I also tried turning on the updateLog but that was filling up so fast to
the point where it is useless.

How do we do bulk updates in solr 4.x environment ?? Is there any setting
that Iam missing ??

Thanks

Ravi Kiran Bhaskar
Technical Architect
The Washington Post

Re: solr 4.x reindexing issues

Posted by Ravi Solr <ra...@gmail.com>.
Sorry Guys, really apologize for wasting your time...bone headed coding on
my part. Did not set the rows and start to correct values for proper
pagination so it was getting the same 10 docs every single time.

Thanks
Ravi Kiran Bhaskar


On Tue, Mar 25, 2014 at 3:50 PM, Ravi Solr <ra...@gmail.com> wrote:

> I just tried even reading from one core A and indexed it into core B and
> the same issue still persists.
>
>
> On Tue, Mar 25, 2014 at 2:49 PM, Lan <du...@gmail.com> wrote:
>
>> Ravi,
>>
>> It looks like you are re-indexing data by pulling data from your solr
>> server
>> and then indexing it back to the same server. I can think of many things
>> that could go wrong with this setup. For example are all your fields
>> stored?
>> Since you are iterating through all documents on the solr server and at
>> the
>> same time modifying the index, the sort order could change.
>>
>> To make it easier to identify any bugs in your process, you should index
>> into a second solr server that is *EMPTY* so you can identify any
>> problems.
>>
>> Generally when people re-index data, they dont pull the data from Solr but
>> from system of record such as a DB.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/solr-4-x-reindexing-issues-tp4126695p4126986.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Re: solr 4.x reindexing issues

Posted by Ravi Solr <ra...@gmail.com>.
I just tried even reading from one core A and indexed it into core B and
the same issue still persists.


On Tue, Mar 25, 2014 at 2:49 PM, Lan <du...@gmail.com> wrote:

> Ravi,
>
> It looks like you are re-indexing data by pulling data from your solr
> server
> and then indexing it back to the same server. I can think of many things
> that could go wrong with this setup. For example are all your fields
> stored?
> Since you are iterating through all documents on the solr server and at the
> same time modifying the index, the sort order could change.
>
> To make it easier to identify any bugs in your process, you should index
> into a second solr server that is *EMPTY* so you can identify any problems.
>
> Generally when people re-index data, they dont pull the data from Solr but
> from system of record such as a DB.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-4-x-reindexing-issues-tp4126695p4126986.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: solr 4.x reindexing issues

Posted by Lan <du...@gmail.com>.
Ravi,

It looks like you are re-indexing data by pulling data from your solr server
and then indexing it back to the same server. I can think of many things
that could go wrong with this setup. For example are all your fields stored?
Since you are iterating through all documents on the solr server and at the
same time modifying the index, the sort order could change.

To make it easier to identify any bugs in your process, you should index
into a second solr server that is *EMPTY* so you can identify any problems.

Generally when people re-index data, they dont pull the data from Solr but
from system of record such as a DB.



--
View this message in context: http://lucene.472066.n3.nabble.com/solr-4-x-reindexing-issues-tp4126695p4126986.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 4.x reindexing issues

Posted by Ravi Solr <ra...@gmail.com>.
Iam also seeing the following in the log. Is it really commiting ??? Now I
am totally confused about how solr 4.x indexes. My relavant update config
is as shown below

  <updateHandler class="solr.DirectUpdateHandler2">
    <maxPendingDeletes>1</maxPendingDeletes>
    <autoCommit>
       <maxDocs>100</maxDocs>
       <maxTime>120000</maxTime>
       <openSearcher>false</openSearcher>
    </autoCommit>
  </updateHandler>

[#|2014-03-25T13:44:03.765-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820509
[commitScheduler-6-thread-1] INFO  org.apache.solr.update.UpdateHandler  -
start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
|#]

[#|2014-03-25T13:44:03.766-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=83;_ThreadName=http-thread-pool-8080(4);|820510
[http-thread-pool-8080(4)] INFO
org.apache.solr.update.processor.LogUpdateProcessor  - [sitesearchcore]
webapp=/solr-admin path=/update params={wt=javabin&version=2}
{add=[09f693e6-9a6f-11e3-9900-dd917233cf9c]} 0 13
|#]

[#|2014-03-25T13:44:03.898-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820642
[commitScheduler-6-thread-1] INFO  org.apache.solr.core.SolrCore  -
SolrDeletionPolicy.onCommit: commits: num=3

commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9y68,generation=464192}

commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9yjf,generation=464667}

commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9yjg,generation=464668}
|#]

[#|2014-03-25T13:44:03.898-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820642
[commitScheduler-6-thread-1] INFO  org.apache.solr.core.SolrCore  - newest
commit generation = 464668
|#]

[#|2014-03-25T13:44:03.908-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820652
[commitScheduler-6-thread-1] INFO
org.apache.solr.search.SolrIndexSearcher  - Opening
Searcher@1e2ca86e[sitesearchcore]
realtime
|#]

[#|2014-03-25T13:44:03.909-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820653
[commitScheduler-6-thread-1] INFO  org.apache.solr.update.UpdateHandler  -
end_commit_flush


Thanks

Ravi Kiran Bhaskar


On Tue, Mar 25, 2014 at 1:10 PM, Ravi Solr <ra...@gmail.com> wrote:

> Thank you very much for responding Mr. Høydahl. I removed the recursion
> which eliminated the stack overflow exception. However, I still
> encountering my main problem with the docs not getting indexed in solr 4.x
> as I mentioned in my original email. The reason I am reindexing is that
> with solr 4.x EnglishPorterFilterFactory has been removed and also I wanted
> to add another copyField of all field values into destination "allfields"
>
> As per your suggestion I removed softcommit and had autoCommit to maxDocs
> 100 and maxTime to 120000. I was printing out the indexing call...You can
> clearly see still it does index around 10 at a time (testing code and
> results shown below). Again my code finished fully and just for a good
> measure I commited manually after 10 minutes still when I query I only see
> "13513" docs got indexed.
>
> There must be something else I am missing
>
> <response>
>      <lst name="responseHeader">
>       <int name="status">0</int>
>       <int name="QTime">1</int>
>       <lst name="params">
>            <str name="q">allfields:[* TO *]</str>
>             <str name="wt">xml</str>
>             <str name="rows">0</str>
>       </lst>
>       </lst>
>       <result name="response" numFound="13513" start="0"/></response>
>
> TEST INDEXER CODE
>  -------------------------------
>         Long total = null;
>         Integer start = 0;
>         Integer rows = 100;
>         while(total == null || total >= (start+rows)) {
>
>             SolrQuery query = new SolrQuery();
>             query.setQuery("*:*");
>             query.setSort("displaydatetime", ORDER.desc);
>
>             query.addFilterQuery("-allfields:[* TO *]");
>             QueryResponse resp = server.query(query);
>             SolrDocumentList list =  resp.getResults();
>             total = list.getNumFound();
>
>             if(list != null && !list.isEmpty()) {
>                 for(SolrDocument doc : list) {
>                     SolrInputDocument iDoc =
> ClientUtils.toSolrInputDocument(doc);
>                     //To index full doc again
>                     iDoc.removeField("_version_");
>                     server.add(iDoc);
>
>                 }
>
>                 System.out.println("Indexed " + (start+rows) + "/" +
> total);
>                 start = (start+rows);
>             }
>         }
>
>        System.out.println("COMPLETELY DONE");
>
> System.out output
> -------------------------
> Indexed 1252100/1256575
> Indexed 1252200/1256575
> Indexed 1252300/1256575
> Indexed 1252400/1256575
> Indexed 1252500/1256575
> Indexed 1252600/1256575
> Indexed 1252700/1256575
> Indexed 1252800/1256575
> Indexed 1252900/1256575
> Indexed 1253000/1256575
> Indexed 1253100/1256566
> Indexed 1253200/1256566
> Indexed 1253300/1256566
> Indexed 1253400/1256566
> Indexed 1253500/1256566
> Indexed 1253600/1256566
> Indexed 1253700/1256566
> Indexed 1253800/1256566
> Indexed 1253900/1256566
> Indexed 1254000/1256566
> Indexed 1254100/1256566
> Indexed 1254200/1256566
> Indexed 1254300/1256566
> Indexed 1254400/1256566
> Indexed 1254500/1256566
> Indexed 1254600/1256566
> Indexed 1254700/1256566
> Indexed 1254800/1256566
> Indexed 1254900/1256566
> Indexed 1255000/1256566
> Indexed 1255100/1256566
> Indexed 1255200/1256566
> Indexed 1255300/1256566
> Indexed 1255400/1256566
> Indexed 1255500/1256566
> Indexed 1255600/1256566
> Indexed 1255700/1256557
> Indexed 1255800/1256557
> Indexed 1255900/1256557
> Indexed 1256000/1256557
> Indexed 1256100/1256557
> Indexed 1256200/1256557
> Indexed 1256300/1256557
> Indexed 1256400/1256557
> Indexed 1256500/1256557
> COMPLETELY DONE
>
>
> Thanks,
> Ravi Kiran Bhaskar
>
>
>
> On Tue, Mar 25, 2014 at 7:13 AM, Jan Høydahl <ja...@cominvent.com>wrote:
>
>> Hi,
>>
>> Seems you try to reindex from one server to the other.
>>
>> Be aware that it could be easier for you to simply copy the whole index
>> folder over to your 4.6.1 server and start Solr as it will be able to read
>> your 3.x index. This is unless you also want to do major upgrades of your
>> schema or update processors so that you'll need a re-index anyway.
>>
>> If you believe you really need a re-index, then please try to batch index
>> without triggering commits every few seconds - this is really heavy on the
>> system and completely unnecessary. You won't get the benefit of SoftCommit
>> if you're not running SolrCloud, so no need to configure that.
>>
>> I would change your <autoCommit> into maxDocs=10000 and maxTime=120000
>> (every 2min).
>> Further please index without 1s commitWithin, i.e. instead of
>> >                server.add(iDoc, 1000);
>> use
>> >                server.add(iDoc);
>>
>> This will make sure the server gets room to breathe and not constantly
>> generating new indices.
>>
>> Finally, it's probably not a good idea to use recursion here. You really
>> don't need to, filling up your stack. You can instead refactor the method
>> to do the whole indexing. And a hint is that it is generally better to ask
>> for ALL documents in one go and stream to the end rather than increasing
>> offsets with new queries all the time - because high offsets/start can be
>> time consuming, especially with multiple shards. If you increase the
>> timeout enough you should be able to retrieve all documents in one go!
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 24. mars 2014 kl. 22:36 skrev Ravi Solr <ra...@gmail.com>:
>>
>> > Hello,
>> >        We are trying to reindex as part of our move from 3.6.2 to 4.6.1
>> > and have faced various issues reindexing 1.5 Million docs. We dont use
>> > solrcloud, its still Master/Slave config. For testing this Iam using a
>> > single test server reading from it and putting back into same index.
>> >
>> > We send docs in batches of 100 but only 10/100 are getting indexed, is
>> this
>> > related to the maxBufferedAddsPerServer setting that is hard coded ??
>> Also
>> > I tried to play with autocommit and softcommit settings but in vain.
>> >
>> >    <autoCommit>
>> >       <maxDocs>5</maxDocs>
>> >       <maxTime>5000</maxTime>
>> >       <openSearcher>true</openSearcher>
>> >    </autoCommit>
>> >
>> >    <autoSoftCommit>
>> >        <maxTime>1000</maxTime>
>> >    </autoSoftCommit>
>> >
>> > I use these on the test system just to check if docs are being indexed,
>> but
>> > even with a batch of 5 my solrj client code runs faster than indexing
>> > causing some docs to not get indexed. The function that's indexing is a
>> > recursive method call  (shown below) which fails after sometime with
>> stack
>> > overflow (I did not have this issue with 3.6.2 with same code)
>> >
>> >    private static void processDocs(HttpSolrServer server, Integer start,
>> > Integer rows) throws Exception {
>> >        SolrQuery query = new SolrQuery();
>> >        query.setQuery("*:*");
>> >        query.addFilterQuery("-allfields:[* TO *]");
>> >        QueryResponse resp = server.query(query);
>> >        SolrDocumentList list =  resp.getResults();
>> >        Long total = list.getNumFound();
>> >
>> >        if(list != null && !list.isEmpty()) {
>> >            for(SolrDocument doc : list) {
>> >                SolrInputDocument iDoc =
>> > ClientUtils.toSolrInputDocument(doc);
>> >                //To index full doc again
>> >                iDoc.removeField("_version_");
>> >                server.add(iDoc, 1000);
>> >            }
>> >
>> >            System.out.println("Indexed " + (start+rows) + "/" + total);
>> >            if (total >= (start + rows)) {
>> >                processDocs(server, (start + rows), rows);
>> >            }
>> >        }
>> >    }
>> >
>> > I also tried turning on the updateLog but that was filling up so fast to
>> > the point where it is useless.
>> >
>> > How do we do bulk updates in solr 4.x environment ?? Is there any
>> setting
>> > that Iam missing ??
>> >
>> > Thanks
>> >
>> > Ravi Kiran Bhaskar
>> > Technical Architect
>> > The Washington Post
>>
>>
>

Re: solr 4.x reindexing issues

Posted by Ravi Solr <ra...@gmail.com>.
Thank you very much for responding Mr. Høydahl. I removed the recursion
which eliminated the stack overflow exception. However, I still
encountering my main problem with the docs not getting indexed in solr 4.x
as I mentioned in my original email. The reason I am reindexing is that
with solr 4.x EnglishPorterFilterFactory has been removed and also I wanted
to add another copyField of all field values into destination "allfields"

As per your suggestion I removed softcommit and had autoCommit to maxDocs
100 and maxTime to 120000. I was printing out the indexing call...You can
clearly see still it does index around 10 at a time (testing code and
results shown below). Again my code finished fully and just for a good
measure I commited manually after 10 minutes still when I query I only see
"13513" docs got indexed.

There must be something else I am missing

<response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
           <str name="q">allfields:[* TO *]</str>
            <str name="wt">xml</str>
            <str name="rows">0</str>
      </lst>
      </lst>
      <result name="response" numFound="13513" start="0"/></response>

TEST INDEXER CODE
 -------------------------------
        Long total = null;
        Integer start = 0;
        Integer rows = 100;
        while(total == null || total >= (start+rows)) {
            SolrQuery query = new SolrQuery();
            query.setQuery("*:*");
            query.setSort("displaydatetime", ORDER.desc);
            query.addFilterQuery("-allfields:[* TO *]");
            QueryResponse resp = server.query(query);
            SolrDocumentList list =  resp.getResults();
            total = list.getNumFound();

            if(list != null && !list.isEmpty()) {
                for(SolrDocument doc : list) {
                    SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);
                    //To index full doc again
                    iDoc.removeField("_version_");
                    server.add(iDoc);
                }

                System.out.println("Indexed " + (start+rows) + "/" + total);
                start = (start+rows);
            }
        }

       System.out.println("COMPLETELY DONE");

System.out output
-------------------------
Indexed 1252100/1256575
Indexed 1252200/1256575
Indexed 1252300/1256575
Indexed 1252400/1256575
Indexed 1252500/1256575
Indexed 1252600/1256575
Indexed 1252700/1256575
Indexed 1252800/1256575
Indexed 1252900/1256575
Indexed 1253000/1256575
Indexed 1253100/1256566
Indexed 1253200/1256566
Indexed 1253300/1256566
Indexed 1253400/1256566
Indexed 1253500/1256566
Indexed 1253600/1256566
Indexed 1253700/1256566
Indexed 1253800/1256566
Indexed 1253900/1256566
Indexed 1254000/1256566
Indexed 1254100/1256566
Indexed 1254200/1256566
Indexed 1254300/1256566
Indexed 1254400/1256566
Indexed 1254500/1256566
Indexed 1254600/1256566
Indexed 1254700/1256566
Indexed 1254800/1256566
Indexed 1254900/1256566
Indexed 1255000/1256566
Indexed 1255100/1256566
Indexed 1255200/1256566
Indexed 1255300/1256566
Indexed 1255400/1256566
Indexed 1255500/1256566
Indexed 1255600/1256566
Indexed 1255700/1256557
Indexed 1255800/1256557
Indexed 1255900/1256557
Indexed 1256000/1256557
Indexed 1256100/1256557
Indexed 1256200/1256557
Indexed 1256300/1256557
Indexed 1256400/1256557
Indexed 1256500/1256557
COMPLETELY DONE


Thanks,
Ravi Kiran Bhaskar



On Tue, Mar 25, 2014 at 7:13 AM, Jan Høydahl <ja...@cominvent.com> wrote:

> Hi,
>
> Seems you try to reindex from one server to the other.
>
> Be aware that it could be easier for you to simply copy the whole index
> folder over to your 4.6.1 server and start Solr as it will be able to read
> your 3.x index. This is unless you also want to do major upgrades of your
> schema or update processors so that you'll need a re-index anyway.
>
> If you believe you really need a re-index, then please try to batch index
> without triggering commits every few seconds - this is really heavy on the
> system and completely unnecessary. You won't get the benefit of SoftCommit
> if you're not running SolrCloud, so no need to configure that.
>
> I would change your <autoCommit> into maxDocs=10000 and maxTime=120000
> (every 2min).
> Further please index without 1s commitWithin, i.e. instead of
> >                server.add(iDoc, 1000);
> use
> >                server.add(iDoc);
>
> This will make sure the server gets room to breathe and not constantly
> generating new indices.
>
> Finally, it's probably not a good idea to use recursion here. You really
> don't need to, filling up your stack. You can instead refactor the method
> to do the whole indexing. And a hint is that it is generally better to ask
> for ALL documents in one go and stream to the end rather than increasing
> offsets with new queries all the time - because high offsets/start can be
> time consuming, especially with multiple shards. If you increase the
> timeout enough you should be able to retrieve all documents in one go!
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 24. mars 2014 kl. 22:36 skrev Ravi Solr <ra...@gmail.com>:
>
> > Hello,
> >        We are trying to reindex as part of our move from 3.6.2 to 4.6.1
> > and have faced various issues reindexing 1.5 Million docs. We dont use
> > solrcloud, its still Master/Slave config. For testing this Iam using a
> > single test server reading from it and putting back into same index.
> >
> > We send docs in batches of 100 but only 10/100 are getting indexed, is
> this
> > related to the maxBufferedAddsPerServer setting that is hard coded ??
> Also
> > I tried to play with autocommit and softcommit settings but in vain.
> >
> >    <autoCommit>
> >       <maxDocs>5</maxDocs>
> >       <maxTime>5000</maxTime>
> >       <openSearcher>true</openSearcher>
> >    </autoCommit>
> >
> >    <autoSoftCommit>
> >        <maxTime>1000</maxTime>
> >    </autoSoftCommit>
> >
> > I use these on the test system just to check if docs are being indexed,
> but
> > even with a batch of 5 my solrj client code runs faster than indexing
> > causing some docs to not get indexed. The function that's indexing is a
> > recursive method call  (shown below) which fails after sometime with
> stack
> > overflow (I did not have this issue with 3.6.2 with same code)
> >
> >    private static void processDocs(HttpSolrServer server, Integer start,
> > Integer rows) throws Exception {
> >        SolrQuery query = new SolrQuery();
> >        query.setQuery("*:*");
> >        query.addFilterQuery("-allfields:[* TO *]");
> >        QueryResponse resp = server.query(query);
> >        SolrDocumentList list =  resp.getResults();
> >        Long total = list.getNumFound();
> >
> >        if(list != null && !list.isEmpty()) {
> >            for(SolrDocument doc : list) {
> >                SolrInputDocument iDoc =
> > ClientUtils.toSolrInputDocument(doc);
> >                //To index full doc again
> >                iDoc.removeField("_version_");
> >                server.add(iDoc, 1000);
> >            }
> >
> >            System.out.println("Indexed " + (start+rows) + "/" + total);
> >            if (total >= (start + rows)) {
> >                processDocs(server, (start + rows), rows);
> >            }
> >        }
> >    }
> >
> > I also tried turning on the updateLog but that was filling up so fast to
> > the point where it is useless.
> >
> > How do we do bulk updates in solr 4.x environment ?? Is there any setting
> > that Iam missing ??
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> > Technical Architect
> > The Washington Post
>
>

Re: solr 4.x reindexing issues

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

Seems you try to reindex from one server to the other.

Be aware that it could be easier for you to simply copy the whole index folder over to your 4.6.1 server and start Solr as it will be able to read your 3.x index. This is unless you also want to do major upgrades of your schema or update processors so that you'll need a re-index anyway.

If you believe you really need a re-index, then please try to batch index without triggering commits every few seconds - this is really heavy on the system and completely unnecessary. You won't get the benefit of SoftCommit if you're not running SolrCloud, so no need to configure that.

I would change your <autoCommit> into maxDocs=10000 and maxTime=120000 (every 2min). 
Further please index without 1s commitWithin, i.e. instead of
>                server.add(iDoc, 1000);
use
>                server.add(iDoc);

This will make sure the server gets room to breathe and not constantly generating new indices.

Finally, it's probably not a good idea to use recursion here. You really don't need to, filling up your stack. You can instead refactor the method to do the whole indexing. And a hint is that it is generally better to ask for ALL documents in one go and stream to the end rather than increasing offsets with new queries all the time - because high offsets/start can be time consuming, especially with multiple shards. If you increase the timeout enough you should be able to retrieve all documents in one go!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

24. mars 2014 kl. 22:36 skrev Ravi Solr <ra...@gmail.com>:

> Hello,
>        We are trying to reindex as part of our move from 3.6.2 to 4.6.1
> and have faced various issues reindexing 1.5 Million docs. We dont use
> solrcloud, its still Master/Slave config. For testing this Iam using a
> single test server reading from it and putting back into same index.
> 
> We send docs in batches of 100 but only 10/100 are getting indexed, is this
> related to the maxBufferedAddsPerServer setting that is hard coded ?? Also
> I tried to play with autocommit and softcommit settings but in vain.
> 
>    <autoCommit>
>       <maxDocs>5</maxDocs>
>       <maxTime>5000</maxTime>
>       <openSearcher>true</openSearcher>
>    </autoCommit>
> 
>    <autoSoftCommit>
>        <maxTime>1000</maxTime>
>    </autoSoftCommit>
> 
> I use these on the test system just to check if docs are being indexed, but
> even with a batch of 5 my solrj client code runs faster than indexing
> causing some docs to not get indexed. The function that's indexing is a
> recursive method call  (shown below) which fails after sometime with stack
> overflow (I did not have this issue with 3.6.2 with same code)
> 
>    private static void processDocs(HttpSolrServer server, Integer start,
> Integer rows) throws Exception {
>        SolrQuery query = new SolrQuery();
>        query.setQuery("*:*");
>        query.addFilterQuery("-allfields:[* TO *]");
>        QueryResponse resp = server.query(query);
>        SolrDocumentList list =  resp.getResults();
>        Long total = list.getNumFound();
> 
>        if(list != null && !list.isEmpty()) {
>            for(SolrDocument doc : list) {
>                SolrInputDocument iDoc =
> ClientUtils.toSolrInputDocument(doc);
>                //To index full doc again
>                iDoc.removeField("_version_");
>                server.add(iDoc, 1000);
>            }
> 
>            System.out.println("Indexed " + (start+rows) + "/" + total);
>            if (total >= (start + rows)) {
>                processDocs(server, (start + rows), rows);
>            }
>        }
>    }
> 
> I also tried turning on the updateLog but that was filling up so fast to
> the point where it is useless.
> 
> How do we do bulk updates in solr 4.x environment ?? Is there any setting
> that Iam missing ??
> 
> Thanks
> 
> Ravi Kiran Bhaskar
> Technical Architect
> The Washington Post