You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/10/16 13:22:05 UTC

[jira] [Commented] (LUCENE-6841) LZ4 compression using too much CPU time

    [ https://issues.apache.org/jira/browse/LUCENE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960523#comment-14960523 ] 

Adrien Grand commented on LUCENE-6841:
--------------------------------------

This issue is a bit tricky, because we have some users wishing that stored fields compression was more aggressive, and other users wishing that compression was less aggressive. But on the other hand, we want to support as few options as possible in order to keep backward compatibility manageable (we already have 2 and certainly want to avoid 3).

Are you sure that the issue is with lz4 and not I/O? The method that performs decompression performs I/O and actual decompression at the same time, so from the output of a profiler it could easily look like the issue is with lz4 while it actually is with I/O.

Also what kind of workload do you have (number of docs, size of individual docs, number of docs retrieved per page)? Any chance that your index fits in memory entirely?

> LZ4 compression using too much CPU time
> ---------------------------------------
>
>                 Key: LUCENE-6841
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6841
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 5.3.1
>         Environment: Linux, Java 8
>            Reporter: Karl von Randow
>
> I am using Lucene for search indexing, including storing a large number of small fields, and some larger plain text fields, and searching using both exact matches and analyzed queries.
> LZ4 (specifically the decompress method) is using nearly exactly 50% of the application's CPU time.
> It seems to me that LZ4 is inappropriate for my use case. I note that I can choose BEST_SPEED or BEST_COMPRESSION.
> Would it be palatable to add a NO_COMPRESSION option, or some way to pick and choose which fields get compressed? Perhaps a minimum length of a field could be specified before it's compressed? I'm not sure if that's possible.
> If this approach, or similar is palatable, I would be happy to contribute a patch (or to consume and test a patch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org