You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2016/10/18 02:59:58 UTC

[jira] [Comment Edited] (SOLR-9599) DocValues performance regression with new iterator API

    [ https://issues.apache.org/jira/browse/SOLR-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584223#comment-15584223 ] 

Yonik Seeley edited comment on SOLR-9599 at 10/18/16 2:59 AM:
--------------------------------------------------------------

Another docvalues faceting test, this time including the current lucene/solr code +  lucene70 codec (as of 10/17) 
This test used 10M documents and single valued string fields with 20% of the values missing (i.e. 80% of docs have a value for any given field).
Note that the 9/19 index has 24 segments and the 10/17 index has 23 segments.

This is a table of new_time/old_time, with old_time being an old docvalues index with old code (as of 9/09) before the docvalues iterator cutover:
||field cardinality||9/09 code with 9/09 index||10/17 code with 9/09 index|| 10/17 code with 10/17 index||
| 10 | 1.00 | 1.39 | 1.41 |
| 100 | 1.00 | 1.38 | 1.46 |
| 1000 | 1.00 | 1.39 | 1.42 |
| 10000 | 1.00 | 1.35 | 1.45 |

So it looks like we're currently over 40% slower in general for faceting on single valued docvalue fields that have some values missing.



was (Author: yseeley@gmail.com):
Another docvalues faceting test, this time including the current lucene/solr code +  lucene70 codec (as of 10/17) 
This test used 10M documents and single valued string fields with 20% of the values missing (i.e. 80% of docs have a value for any given field).
Note that the 9/19 index has 24 segments and the 10/17 index has 23 segments.

This is a table of new_time/old_time, with old_time being an old docvalues index with old code (as of 9/09) before the docvalues iterator cutover:
||field cardinality||10/17 code with 9/09 index|| 10/17 code with 10/17 index||
| 10 | 1.39 | 1.41 |
| 100 | 1.38 | 1.46 |
| 1000 | 1.39 | 1.42 |
| 10000 | 1.35 | 1.45 |

So it looks like we're over 40% slower in general for faceting on single valued docvalue fields that have some values missing.


> DocValues performance regression with new iterator API
> ------------------------------------------------------
>
>                 Key: SOLR-9599
>                 URL: https://issues.apache.org/jira/browse/SOLR-9599
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: master (7.0)
>            Reporter: Yonik Seeley
>             Fix For: master (7.0)
>
>
> I did a quick performance comparison of faceting indexed fields (i.e. docvalues are not stored) using method=dv before and after the new docvalues iterator went in (LUCENE-7407).
> 5M document index, 21 segments, single valued string fields w/ no missing values.
> || field cardinality || new_time / old_time ||
> |10|2.01|
> |1000|2.02|
> |10000|1.85|
> |100000|1.56|
> |1000000|1.31|
> So unfortunately, often twice as slow.
> See followup messages for tests using real docvalues as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org