You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Renaud Delbru (Jira)" <ji...@apache.org> on 2022/03/01 15:35:00 UTC
[jira] [Updated] (LUCENE-10449) Unnecessary ByteArrayDataInput introduced with compression on binary doc values introduced

     [ https://issues.apache.org/jira/browse/LUCENE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Renaud Delbru updated LUCENE-10449:
-----------------------------------
    Description: 
LUCENE-9211 introduced a compression mechanism for binary doc values, which was then removed at a later stage in LUCENE-9843 as it was impacting performance on some workload.

However, LUCENE-9843 didn't revert the code as it was prior to that. Instead of reading the block directly from the `IndexInput` as in [1], the `decompressBlock()` call [2] is kept which is decompressing a non-compress block (from our understanding). The `decompressBlock` method deleguates to `LZ4.decompress` and it looks like this is adding a significant overhead (e.g., `readByte`).

This has quite an impact on our workloads which heavily uses doc values. It may lead to perf regression from 2x up to 5x. See samples below.

 
{code:java}
❯ times_tasks Elasticsearch 7.10.2 (Lucene 8.7) - no binary compression
name                      type                        time_min time_max time_p50 time_p90
7.10.2-22.6-SNAPSHOT.json total                       42       90       45       66
7.10.2-22.6-SNAPSHOT.json SearchJoinRequest1          14       32       15       18
7.10.2-22.6-SNAPSHOT.json SearchTaskBroadcastRequest2 23       53       27       43

❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - with binary compression
name                      type                        time_min time_max time_p50 time_p90
7.17.0-27.1-SNAPSHOT.json total                       253      327      285      310
7.17.0-27.1-SNAPSHOT.json SearchJoinRequest1          121      154      142      152
7.17.0-27.1-SNAPSHOT.json SearchTaskBroadcastRequest2 122      173      140      152

❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - lucene_default codec is used to bypass the binary compression 
name                        type                        time_min time_max time_p50 time_p90
7.17.0-27.1-SNAPSHOT.json.2 total                       48       96       63       75
7.17.0-27.1-SNAPSHOT.json.2 SearchJoinRequest1          19       44       25       31
7.17.0-27.1-SNAPSHOT.json.2 SearchTaskBroadcastRequest2 23       42       29       37

❯ times_tasks Elasticsearch 8.0 (Lucene 9.0) - no binary compression
name                     type                        time_min time_max time_p50 time_p90
8.0.0-28.0-SNAPSHOT.json total                       260      327      287      313
8.0.0-28.0-SNAPSHOT.json SearchJoinRequest1          122      168      148      158
8.0.0-28.0-SNAPSHOT.json SearchTaskBroadcastRequest2 123      165      139      155{code}
We can clearly see that in Lucene 9.0, even after the removal of the binary doc values compression, the performance didn't improve. Profiling the execution indicates that the bottleneck is the `LZ4.decompress`. We have attached two screenshots of a flamegraph.

The CPU time of the `TermsDict.next` method with Lucene 8.11 with no compression is around 2 seconds, while the CPU time of the same method in Lucene 9.0 is 12 seconds. This was measured on a small benchmark reading a fixed number of times a binary doc values field. Each document is created with a single binary value that represents a UUID. 

 

[1] [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.0/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L1159]

[2] [https://github.com/apache/lucene/commit/a7a02519f0a5652110a186f4909347ac3349092d#diff-ab443662a6310fda675a4bd6d01fabf3a38c4c825ec2acef8f9a34af79f0b252R1022]

 

  was:
LUCENE-9211 introduced a compression mechanism for binary doc values, which was then removed at a later stage in LUCENE-9843 as it was impacting performance on some workload.

However, LUCENE-9843 didn't revert the code as it was prior to that. Instead of reading the block directly from the `IndexInput` as in [1], the `decompressBlock()` call [2] is kept which is decompressing a non-compress block (from our understanding). The `decompressBlock` method deleguates to `LZ4.decompress` and it looks like this is adding a significant overhead (e.g., `readByte`).

This has quite an impact on our workloads which heavily uses doc values. It may lead to perf regression from 2x up to 5x. See samples below.

 
{code:java}
❯ times_tasks Elasticsearch 7.10.2 (Lucene 8.7) - no binary compression
name                      type                        time_min time_max time_p50 time_p90
7.10.2-22.6-SNAPSHOT.json total                       42       90       45       66
7.10.2-22.6-SNAPSHOT.json SearchJoinRequest1          14       32       15       18
7.10.2-22.6-SNAPSHOT.json SearchTaskBroadcastRequest2 23       53       27       43

❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - with binary compression
name                      type                        time_min time_max time_p50 time_p90
7.17.0-27.1-SNAPSHOT.json total                       253      327      285      310
7.17.0-27.1-SNAPSHOT.json SearchJoinRequest1          121      154      142      152
7.17.0-27.1-SNAPSHOT.json SearchTaskBroadcastRequest2 122      173      140      152

❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - lucene_default codec is used to bypass the binary compression 
name                        type                        time_min time_max time_p50 time_p90
7.17.0-27.1-SNAPSHOT.json.2 total                       48       96       63       75
7.17.0-27.1-SNAPSHOT.json.2 SearchJoinRequest1          19       44       25       31
7.17.0-27.1-SNAPSHOT.json.2 SearchTaskBroadcastRequest2 23       42       29       37

❯ times_tasks Elasticsearch 8.0 (Lucene 9.0) - no binary compression
name                     type                        time_min time_max time_p50 time_p90
8.0.0-28.0-SNAPSHOT.json total                       260      327      287      313
8.0.0-28.0-SNAPSHOT.json SearchJoinRequest1          122      168      148      158
8.0.0-28.0-SNAPSHOT.json SearchTaskBroadcastRequest2 123      165      139      155{code}
We can clearly see that in Lucene 9.0, even after the removal of the binary doc values compression, the performance didn't improve. Profiling the execution indicates that the bottleneck is the `LZ4.decompress`. We have attached two screenshots of a flamegraph.

The CPU time of the `TermsDict.next` method with Lucene 8.11 with no compression is around 2 seconds, while the CPU time of the same method in Lucene 9.0 is 12 seconds. This was measured on a small benchmark reading a fixed number of times a binary doc values field. The documents were created with a single binary value that represents a UUID. 

 

[1] [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.0/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L1159]

[2] [https://github.com/apache/lucene/commit/a7a02519f0a5652110a186f4909347ac3349092d#diff-ab443662a6310fda675a4bd6d01fabf3a38c4c825ec2acef8f9a34af79f0b252R1022]

 


> Unnecessary ByteArrayDataInput introduced with compression on binary doc values introduced 
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10449
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10449
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 9.0
>            Reporter: Renaud Delbru
>            Priority: Major
>         Attachments: lucene-8.11-no-compression.png, lucene-9.png
>
>
> LUCENE-9211 introduced a compression mechanism for binary doc values, which was then removed at a later stage in LUCENE-9843 as it was impacting performance on some workload.
> However, LUCENE-9843 didn't revert the code as it was prior to that. Instead of reading the block directly from the `IndexInput` as in [1], the `decompressBlock()` call [2] is kept which is decompressing a non-compress block (from our understanding). The `decompressBlock` method deleguates to `LZ4.decompress` and it looks like this is adding a significant overhead (e.g., `readByte`).
> This has quite an impact on our workloads which heavily uses doc values. It may lead to perf regression from 2x up to 5x. See samples below.
>  
> {code:java}
> ❯ times_tasks Elasticsearch 7.10.2 (Lucene 8.7) - no binary compression
> name                      type                        time_min time_max time_p50 time_p90
> 7.10.2-22.6-SNAPSHOT.json total                       42       90       45       66
> 7.10.2-22.6-SNAPSHOT.json SearchJoinRequest1          14       32       15       18
> 7.10.2-22.6-SNAPSHOT.json SearchTaskBroadcastRequest2 23       53       27       43
> ❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - with binary compression
> name                      type                        time_min time_max time_p50 time_p90
> 7.17.0-27.1-SNAPSHOT.json total                       253      327      285      310
> 7.17.0-27.1-SNAPSHOT.json SearchJoinRequest1          121      154      142      152
> 7.17.0-27.1-SNAPSHOT.json SearchTaskBroadcastRequest2 122      173      140      152
> ❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - lucene_default codec is used to bypass the binary compression 
> name                        type                        time_min time_max time_p50 time_p90
> 7.17.0-27.1-SNAPSHOT.json.2 total                       48       96       63       75
> 7.17.0-27.1-SNAPSHOT.json.2 SearchJoinRequest1          19       44       25       31
> 7.17.0-27.1-SNAPSHOT.json.2 SearchTaskBroadcastRequest2 23       42       29       37
> ❯ times_tasks Elasticsearch 8.0 (Lucene 9.0) - no binary compression
> name                     type                        time_min time_max time_p50 time_p90
> 8.0.0-28.0-SNAPSHOT.json total                       260      327      287      313
> 8.0.0-28.0-SNAPSHOT.json SearchJoinRequest1          122      168      148      158
> 8.0.0-28.0-SNAPSHOT.json SearchTaskBroadcastRequest2 123      165      139      155{code}
> We can clearly see that in Lucene 9.0, even after the removal of the binary doc values compression, the performance didn't improve. Profiling the execution indicates that the bottleneck is the `LZ4.decompress`. We have attached two screenshots of a flamegraph.
> The CPU time of the `TermsDict.next` method with Lucene 8.11 with no compression is around 2 seconds, while the CPU time of the same method in Lucene 9.0 is 12 seconds. This was measured on a small benchmark reading a fixed number of times a binary doc values field. Each document is created with a single binary value that represents a UUID. 
>  
> [1] [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.0/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L1159]
> [2] [https://github.com/apache/lucene/commit/a7a02519f0a5652110a186f4909347ac3349092d#diff-ab443662a6310fda675a4bd6d01fabf3a38c4c825ec2acef8f9a34af79f0b252R1022]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org