You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2015/08/27 15:45:46 UTC

[jira] [Commented] (LUCENE-6764) Payloads should be compressed

    [ https://issues.apache.org/jira/browse/LUCENE-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716681#comment-14716681 ] 

Robert Muir commented on LUCENE-6764:
-------------------------------------

I disagree. Payloads should be something small like a byte or two. I dont even think they should be variable length: its a trap that adds additional per position noise. We should not encourage putting the contents of moby dick per position nor should we suffer the complexity hassles.

> Payloads should be compressed
> -----------------------------
>
>                 Key: LUCENE-6764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> I think we should at least try to do something simple, eg. deduplicate or apply simple LZ77 compression. For instance if you use enclosing html tags to give different weights to individual terms, there might be lots of repetitions as there are not that many unique html tags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org