You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/09/09 21:35:02 UTC

DO NOT REPLY [Bug 31149] New: - [PATCH] to store binary fields with compression

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=31149>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=31149

[PATCH] to store binary fields with compression

           Summary: [PATCH] to store binary fields with compression
           Product: Lucene
           Version: CVS Nightly - Specify date in submission
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: Index
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: bernhard.messer@intrafind.de


hi all,

as promised here is the enhancement for the binary field patch with optional
compression. The attachment includes all necessary diffs based on the latest
version from CVS. There is also a small junit test case to test the core
functionality for binary field compression. The base implementation for binary
fields where this patch relies on, can be found in patch #29370. The existing
unit tests pass fine.

For testing binary fields and compression, I'm creating an index from 2700 plain
text files (avg. 6kb per file) and store all file content within that index
without using compression. The test was created using the IndexFiles class from
the demo distribution. Setting up the index and storing all content without
compression took about 60 secs and the final index size was 21 MB. Running the
same test, switching compression on, the time to index increase to 75 secs, but
the final index size shrinks to 13 MB. This is less than the plain text files
them self need in the file system (15 MB)

Hopefully this patch helps people dealing with huge index and want to store more
than just 300 bytes per document to display a well formed summary.

regards
Bernhard

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org