You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Julie Tibshirani (Jira)" <ji...@apache.org> on 2021/02/01 19:38:00 UTC
[jira] [Comment Edited] (LUCENE-9705) Move all codec formats to the o.a.l.codecs.Lucene90 package

    [ https://issues.apache.org/jira/browse/LUCENE-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276599#comment-17276599 ] 

Julie Tibshirani edited comment on LUCENE-9705 at 2/1/21, 7:37 PM:
-------------------------------------------------------------------

{quote}It's especially clear here where we must copy a lot of classes with no change at all, merely to clearly and consistently document the index version change.
{quote}
I’ll try to add some context since I suspect there might be misunderstanding. In general when there is a new major version, we *do not* plan to create all new index format classes. We only copy a class and move it to backwards-codecs when there is a change to that specific format, for example {{PointsFormat}}. This proposal applies only to the 9.0 release, and its main purpose is to support the work in LUCENE-9047 to move all formats to little endian. My understanding is that moving to little endian impacts all the formats and will be much cleaner if we used these fresh {{Lucene90*Format}}.
{quote}I wonder if we (eventually) should consider shifting to a versioning system that doesn't require new classes. Is this somehow a feature of the service discovery API that we use?
{quote}
We indeed load codecs (with their formats) through a service discovery API. If a user wants to read indices from a previous major version, they can depend on backwards-codecs so Lucene loads the correct older codec. As of LUCENE-9669, we allow reading indices back to version N-2.

I personally really like the current "copy-on-write" system for formats. There’s code duplication, but it has advantages over combining different version logic in the same file:
 * It’s really clear how each version behaves. Having a direct copy like {{Lucene70Codec}} is almost as if we were pulling in the codec jars from Lucene 7.0.
 * It decreases risk of introducing bugs or accidental changes. If you’re making an enhancement to a new format, there’s little chance of changing the logic for an old format (since it lives in a separate class). This is especially important since older formats are not tested as thoroughly.

I started to appreciate it after experiencing the alternative in Elasticsearch, where we’re constantly bumping into if/ else version checks when making changes.


was (Author: julietibs):
{quote}It's especially clear here where we must copy a lot of classes with no change at all, merely to clearly and consistently document the index version change.
{quote}
I’ll try to add some context since I suspect there might be misunderstanding. In general when there is a new major version, we *do not* plan to create all new index format classes. We only copy a class and move it to backwards-codecs when there is a change to that specific format, for example {{PointsFormat}}. This proposal applies only to the 9.0 release, and its main purpose is to support the work in LUCENE-9047 to move all formats to little endian. My understanding is that moving to little endian impacts all the formats and will be much cleaner if we used these fresh {{Lucene90*Format}}.
{quote}I wonder if we (eventually) should consider shifting to a versioning system that doesn't require new classes. Is this somehow a feature of the service discovery API that we use?
{quote}
We indeed load codecs (with their formats) through a service discovery API. If a user wants to read indices from a previous major version, they can depend on backwards-codecs so Lucene loads the correct older codec. As of LUCENE-9669, we allow reading indices back to version N-2.

I personally really like the current "copy-on-write" system for formats. There’s code duplication, but it has advantages over combining different version logic in the same file:
 * It’s really clear how each version behaves. Having a direct copy like \{{Lucene70Codec} is almost as if we were pulling in the codec jars from Lucene 7.0.
 * It decreases risk of introducing bugs or accidental changes. If you’re making an enhancement to a new format, there’s little chance of changing the logic for an old format (since it lives in a separate class). This is especially important since older formats are not tested as thoroughly.

I started to appreciate it after experiencing the alternative in Elasticsearch, where we’re constantly bumping into if/ else version checks when making changes.

> Move all codec formats to the o.a.l.codecs.Lucene90 package
> -----------------------------------------------------------
>
>                 Key: LUCENE-9705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9705
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Ignacio Vera
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current formats are distributed in different packages, prefixed with the Lucene version they were created. With the upcoming release of Lucene 9.0, it would be nice to move all those formats to just the o.a.l.codecs.Lucene90 package (and of course moving the current ones to the backwards-codecs).
> This issue would actually facilitate moving the directory API to little endian (LUCENE-9047) as the only codecs that would need to handle backwards compatibility will be the codecs in backwards codecs.
> In addition, it can help formalising the use of internal versions vs format versioning ( LUCENE-9616)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org