You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Steven Bower (JIRA)" <ji...@apache.org> on 2014/07/19 05:50:38 UTC

[jira] [Updated] (SOLR-6259) Performance issue with large number of fields and values when using copyFields

     [ https://issues.apache.org/jira/browse/SOLR-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Bower updated SOLR-6259:
-------------------------------

    Attachment: SOLR-6259.patch

Attached patch that fixes this issue. Basically combination of tracking which fields are used in a hashset and by moving repeated checking from inner loops to further out loops.

> Performance issue with large number of fields and values when using copyFields
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-6259
>                 URL: https://issues.apache.org/jira/browse/SOLR-6259
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.8.1
>            Reporter: Steven Bower
>            Priority: Critical
>         Attachments: SOLR-6259.patch
>
>
> When you have schema with a large enough number of fields (in my case around 250 fields) and you use copyFields to populate a number of fields (very few in my case 3-4) you see a severe degradation in the performance of ingestion.
> Tracking this down using a profiler found that in the lucene Document.getField() was using 87% of all CPU time. As it turns out getField() does an iteration over the list of fields in the Document returning the field if the name matches.. Anyway in the case of copyFields with lots of values getField() gets called alot...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org