You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Li, Ryan" <Ry...@sensis.com.au> on 2014/09/04 04:14:25 UTC

Solr add document over 20 times slower after upgrade from 4.0 to 4.9

I have a Solr server  indexes 2500 documents (up to 50MB each, ave 3MB) to Solr server. When running on Solr 4.0 I managed to finish index in 3 hours.

However after we upgrade to Solr 4.9, the index need 3 days to finish.

I've done some profiling, numbers I get are:
size figure of document,    time for adding to Solr server (4.0), time for adding to Solr server (4.9)
1.18,                                   6 sec,                                                   123 sec
2.26                                   12sec                                                   444 sec
3.35                                   18sec                                                   over 600 sec
9.65                                    46sec                                                  timeout.

>From what I can see index seems has an o(n) performance for Solr 4.0 and is almost o(log n) for Solr 4.9. I also tried to comment out some copied fields to narrow down the problem, seems size of the document after index(we copy fields and the more fields we copy, the bigger the index size is)  is the dominating factor for index time.

Just wondering has any one experience similar problem? Does that sound like a bug of Solr or just we have use Solr 4.9 wrong?

Here is one example of  field definition in my schema file.
        <fieldType name="text_stem" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="'+" replacement="" /> <!-- strip off all apostrophe (') characters -->
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="../../resources/type-index-synonyms.txt"/>
                <filter class="solr.SnowballPorterFilterFactory" language="English" />
                <!-- Used to have  language="English" - seems this param is gone in 4.9 -->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="'+" replacement="" /> <!-- strip off all apostrophe (') characters -->
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="../../resources/type-query-colloq-synonyms.txt"/>
                <filter class="solr.SnowballPorterFilterFactory" language="English" />
                <!-- Used to have  language="English" - seems this param is gone in 4.9 -->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>
Field:
<field name="majorTextSignalStem" type="text_stem" indexed="true" stored="false" multiValued="true" omitNorms="false"/>
Copy:
 <copyField dest="majorTextSignalStem" source="majorTextSignalRaw" />

Thanks,
Ryan


Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

Posted by Erick Erickson <er...@gmail.com>.
Ryan:

As it happens, there's a discssion on the dev list about this.

If at all possible, could you try a brief experiment? Turn off
all the storage, i.e. set stored="false" on all fields. It's a lot
to ask, but it'd help the discussion.

Or join the discussion at https://issues.apache.org/jira/browse/LUCENE-5914.

Best,
Erick

On Thu, Sep 4, 2014 at 1:08 AM, Shawn Heisey <so...@elyograg.org> wrote:
> On 9/3/2014 8:14 PM, Li, Ryan wrote:
>> I have a Solr server  indexes 2500 documents (up to 50MB each, ave 3MB) to Solr server. When running on Solr 4.0 I managed to finish index in 3 hours.
>>
>> However after we upgrade to Solr 4.9, the index need 3 days to finish.
>>
>> I've done some profiling, numbers I get are:
>> size figure of document,    time for adding to Solr server (4.0), time for adding to Solr server (4.9)
>> 1.18,                                   6 sec,                                                   123 sec
>> 2.26                                   12sec                                                   444 sec
>> 3.35                                   18sec                                                   over 600 sec
>> 9.65                                    46sec                                                  timeout.
>>
>> From what I can see index seems has an o(n) performance for Solr 4.0 and is almost o(log n) for Solr 4.9. I also tried to comment out some copied fields to narrow down the problem, seems size of the document after index(we copy fields and the more fields we copy, the bigger the index size is)  is the dominating factor for index time.
>>
>> Just wondering has any one experience similar problem? Does that sound like a bug of Solr or just we have use Solr 4.9 wrong?
>
> One possible source of problems with that particular upgrade is the fact
> that stored field compression was added in 4.1, and termvector
> compression was added in 4.2.  They are on by default and cannot be
> turned off.  The compression is typically fast, but with very large
> documents like yours, it might result in pretty major computational
> overhead.  It can also require additional java heap, which ties into
> what follows:
>
> Another problem might be RAM-related.
>
> If your java heap is very large, or just a little bit too small, there
> can be major performance issues from garbage collection.  Based on the
> fact that the earlier version performed well, a too-small heap is more
> likely than a very large heap.
>
> If your index size is such that it can't be effectively cached by the
> amount of total RAM on the machine (minus the java heap assigned to
> Solr), that can cause performance problems.  Your index size is likely
> to be several gigabytes, and might even reach double-digit gigabytes.
> Can you relate those numbers -- index size, java heap size, and total
> system RAM?  If you can, it would also be a good idea to share your
> solrconfig.xml.
>
> Here's a wiki page that goes into more detail about possible performance
> issues.  It doesn't mention the possible compression problem:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Thanks,
> Shawn
>

Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

Posted by Shawn Heisey <so...@elyograg.org>.
On 9/3/2014 8:14 PM, Li, Ryan wrote:
> I have a Solr server  indexes 2500 documents (up to 50MB each, ave 3MB) to Solr server. When running on Solr 4.0 I managed to finish index in 3 hours.
> 
> However after we upgrade to Solr 4.9, the index need 3 days to finish.
> 
> I've done some profiling, numbers I get are:
> size figure of document,    time for adding to Solr server (4.0), time for adding to Solr server (4.9)
> 1.18,                                   6 sec,                                                   123 sec
> 2.26                                   12sec                                                   444 sec
> 3.35                                   18sec                                                   over 600 sec
> 9.65                                    46sec                                                  timeout.
> 
> From what I can see index seems has an o(n) performance for Solr 4.0 and is almost o(log n) for Solr 4.9. I also tried to comment out some copied fields to narrow down the problem, seems size of the document after index(we copy fields and the more fields we copy, the bigger the index size is)  is the dominating factor for index time.
> 
> Just wondering has any one experience similar problem? Does that sound like a bug of Solr or just we have use Solr 4.9 wrong?

One possible source of problems with that particular upgrade is the fact
that stored field compression was added in 4.1, and termvector
compression was added in 4.2.  They are on by default and cannot be
turned off.  The compression is typically fast, but with very large
documents like yours, it might result in pretty major computational
overhead.  It can also require additional java heap, which ties into
what follows:

Another problem might be RAM-related.

If your java heap is very large, or just a little bit too small, there
can be major performance issues from garbage collection.  Based on the
fact that the earlier version performed well, a too-small heap is more
likely than a very large heap.

If your index size is such that it can't be effectively cached by the
amount of total RAM on the machine (minus the java heap assigned to
Solr), that can cause performance problems.  Your index size is likely
to be several gigabytes, and might even reach double-digit gigabytes.
Can you relate those numbers -- index size, java heap size, and total
system RAM?  If you can, it would also be a good idea to share your
solrconfig.xml.

Here's a wiki page that goes into more detail about possible performance
issues.  It doesn't mention the possible compression problem:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn