You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Jason Gerlowski (Jira)" <ji...@apache.org> on 2021/03/29 18:46:00 UTC

[jira] [Updated] (LUCENE-9893) Document or fix CodecUtil's codec requirements/limitations

     [ https://issues.apache.org/jira/browse/LUCENE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Gerlowski updated LUCENE-9893:
------------------------------------
    Summary: Document or fix CodecUtil's codec requirements/limitations  (was: Document or fix CodecUtil limitations)

> Document or fix CodecUtil's codec requirements/limitations
> ----------------------------------------------------------
>
>                 Key: LUCENE-9893
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9893
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 8.8
>            Reporter: Jason Gerlowski
>            Priority: Minor
>
> Lucene's {{CodecUtil}} has methods which do most of the heavy lifting for reading, writing, and validating the headers and footers used by _most_ Lucene codecs.
> But not all codecs make use of the standard header/footer format supported by CodecUtil.  SimpleTextCodec is one example: it's avoidance of non-text data causes it to skip the standard footer format in favor of a custom text-based footer format.  {{CodecUtil.checkFooter}} (for example) called with a SimpleText-based {{IndexInput}} will produce a CorruptIndexException when it doesn't find the 'magic-number' it expects to lead off the footer:
> {code}
> org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=808464432 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/home/jenkins/workspace/Lucene-Solr-8.x-Linux/solr/....")))
> {code}
> This undocumented limitation makes it hard for consumers to use CodecUtil generically for checksum validation in their code.  If it's the consumer's responsibility to check the codec for calling, then CodecUtil should mention this responsibility in Javadocs.  Alternatively, if CodecUtil is meant to handle all codecs, then it needs some additional logic to handle some of the "oddball" codecs that don't use the standard footers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org