You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/09/01 10:33:45 UTC

[jira] [Commented] (LUCENE-6764) Payloads should be compressed

    [ https://issues.apache.org/jira/browse/LUCENE-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724976#comment-14724976 ] 

Adrien Grand commented on LUCENE-6764:
--------------------------------------

bq. Payloads should be something small like a byte or two. I dont even think they should be variable length: its a trap that adds additional per position noise. We should not encourage putting the contents of moby dick per position nor should we suffer the complexity hassles.

Of course you want payloads to be small. My point was that there is likely a very finite set of unique payloads and so we could likely store these payloads on a couple of _bits_ instead of one or two entire _bytes_.

> Payloads should be compressed
> -----------------------------
>
>                 Key: LUCENE-6764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> I think we should at least try to do something simple, eg. deduplicate or apply simple LZ77 compression. For instance if you use enclosing html tags to give different weights to individual terms, there might be lots of repetitions as there are not that many unique html tags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org