You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael McCandless (Jira)" <ji...@apache.org> on 2020/02/01 00:37:00 UTC
[jira] [Commented] (LUCENE-9191) Fix linefiledocs compression or
replace in tests
[ https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027912#comment-17027912 ]
Michael McCandless commented on LUCENE-9191:
--------------------------------------------
Thanks [~rcmuir], it sounds like we could pre-split into N chunks, compress them separately and then concatenate that, and record the resulting seek points (in compressed bytes space) maybe in a simple {{.txt}} file side-by-side with the .gz file? Or maybe somehow the Java gzip APIs could read metadata up front, and determine the (fast) skip points dynamically?
Thanks to the chunked {{.gz}} encoding, the resulting file should be a valid {{.gz}} file too.
> Fix linefiledocs compression or replace in tests
> ------------------------------------------------
>
> Key: LUCENE-9191
> URL: https://issues.apache.org/jira/browse/LUCENE-9191
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Robert Muir
> Priority: Major
>
> LineFileDocs(random) is very slow, even to open. It does a very slow "random skip" through a gzip compressed file.
> For the analyzers tests, in LUCENE-9186 I simply removed its usage, since TestUtil.randomAnalysisString is superior, and fast. But we should address other tests using it, since LineFileDocs(random) is slow!
> I think it is also the case that every lucene test has probably tested every LineFileDocs line many times now, whereas randomAnalysisString will invent new ones.
> Alternatively, we could "fix" LineFileDocs(random), e.g. special compression options (in blocks)... deflate supports such stuff. But it would make it even hairier than it is now.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org