You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Ian Cook <ia...@ursacomputing.com> on 2022/12/09 02:50:41 UTC
Re: [DISCUSS] Maintenance policy

This topic was discussed in the Arrow sync call this week. See the
notes from that call here:
https://lists.apache.org/thread/gbywpzbvpfydq24m1c0w6jgybnsrf9xm

Ian

On Wed, Nov 23, 2022 at 7:36 AM Benson Muite <be...@emailplus.org> wrote:
>
> On 10/19/22 20:47, Will Jones wrote:
> > One particular type of defect we might want to consider backporting to
> > supported versions are ones that silently produce incorrect data. Unlike
> > ones that cause a crash, it's not easy for a user to know they are affected.
> >
> > Here are a few examples:
> >
> >   * ARROW-17453: [Go][C++][Parquet] Inconsistent Data with Repetition Levels
> > [1] (fixed in 10.0.0)
> >   * ARROW-17995: [C++] Fix json decimals not being rescaled based on the
> > explicit schema [2] (fixed in 10.0.0)
> >   * ARROW-14523: [C++] Fix potential data loss in S3 multipart upload [3]
> > (fixed in 7.0.0)
> >
> > Also, I know we have high release costs for new versions, but is that also
> > true for backporting fixes? Unlike new releases, if we were creating a
> > bugfix release, we are presumably starting from a much more stable point,
> > right?
> >
> > Thanks,
> >
> > Will Jones
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-17453
> > [2] https://issues.apache.org/jira/browse/ARROW-17995
> > [3] https://issues.apache.org/jira/browse/ARROW-14523
> >
> > On Wed, Oct 19, 2022 at 9:32 AM Todd Farmer <to...@voltrondata.com.invalid>
> > wrote:
> >
> >> Hi,
> >>
> >> I've been thinking a lot about maintenance and lifecycle policies and
> >> defect classification recently - I'm very grateful this is being raised. I
> >> believe establishing such policies will prove instrumental to enable
> >> adoption of Arrow for a number of use cases that prioritize stability over
> >> innovation.
> >>
> >> On Wed, Oct 19, 2022 at 5:42 AM Antoine Pitrou <an...@python.org> wrote:
> >>
> >>>
> >>> Hi Kou,
> >>>
> >>> Le 19/10/2022 à 06:29, Sutou Kouhei a écrit :
> >>>>
> >>>> My proposal: We maintain the last major release:
> >>>> * We maintain 9.Y.Z when the latest major release is 9.0.0
> >>>> * We may release 9.Y.Z when we find a problem such as a
> >>>>     security vulnerability in 9.Y.Z
> >>>> * We drop support for 9.Y.Z when we release 10.0.0
> >>>
> >>> That sounds ok to me, but is there a more precise criterion than "we
> >>> find a problem"?
> For most users, backwards compatibility and supported platforms are
> likely more important than the version number.  If there are many
> breaking API changes, this increases the cost of using Arrow, so
> supporting easy continuous use of Arrow should be the goal.
> >>>
> >>> In the past, we have from time to time done maintenance releases based
> >>> on annoying bugs/regressions. But not always.
> >>>
> >>
> >> I very much agree, and actually think there are multiple questions to
> >> answer here:
> >>
> >> 1. Which class of defects should be allowed to be merged into a maintenance
> >> branch?
> >> 2. Which class of defects must be fixed in a supported maintenance branch?
> >> 3. Which class of defects should trigger a maintenance release once a fix
> >> is made to the branch?
> >> 4. Which versions should be targeted in backporting a defect fix?  How long
> >> will a release receive maintenance support?
> >> 5. Which class of defects can be batched into a future maintenance release,
> >> and which need immediate release?
> >> 6. What delivery artifacts are needed for maintenance releases? Can some
> >> things be source-only?
> >>
> >> Today, any fix may be a candidate for backporting to a maintenance branch
> >> if there's support for doing so in a vote. I believe it might be useful to
> >> more formally triage defects in part to establish policy answering these
> >> questions. For example:
> >>
> >> * How severe is the defect?  Does it produce wrong results? Cause crashes?
> >> Or is it an annoying spelling error in a log message?
> >> * How widespread is the impact? Is everybody who uses Arrow going to be
> >> affected by this? Or is it only triggered by some very obscure use case?
> >> * How accessible is any workaround?
> >> * How much risk is involved in a fix?
> >>
> >> Having a common framework to classify those elements above would enable
> >> policy that clearly defines which defects can (or should, eventually) get
> >> what attention.
> >>
> >> If there is interest in the community, I'll continue a draft proposal I'm
> >> working on to formalize triage to capture these aspects. Any such triage
> >> process would be entirely optional for work done against master/main, but
> >> could be required for assessing potential backports as needed.
> >>
> >> I'll also note that I recognize Arrow may not currently see a need to
> >> answer all the questions about maintenance/lifecycle policy today, or may
> >> not have the resources needed to deliver what may be desired. It takes a
> >> lot of work to generate a release today. I think it's completely
> >> appropriate to commit only to what can be delivered today, with an eye
> >> towards incremental improvement. For example, an entirely acceptable policy
> >> might be:
> >>
> >> * Only the most recently-released minor version is eligible for defect
> >> fixes.
> >> * Security vulnerabilities with CVSS 3.0 score >= 7.0 (High) should trigger
> >> a maintenance release.
> >> * Fixes for defects of any nature may be backported if it reaches
> >> established thresholds (TBD) for severity, widespread impact, workaround
> >> accessibility and risk. Such fixes will be incorporated into the release
> >> maintenance release, made available via source, but no release will be
> >> produced unless triggered by a subsequent security vulnerability fix.
> >>
> It may be good to disclose known problems on a site associated with the
> release.  Bug tickets are helpful for work in progress, but wont fix or
> cannot fix resolutions associated to a release may be hard to find.  As
> an example on the current release 10.0.0 and 10.0.1, there are problems
> with old Glibc on CentOS7 producing incorrect results for a timestamp
> comparison.  It is unlikely this will be fixed, but maybe something
> users want to be aware of.
> >>
> >>>> I think that we can maintain multiple major releases with
> >>>> not high release cost by implementing the followings:
> >>>> * Green nightly CI
> >>>> * Nightly CI for all maintained branches (maint-X.Y.Z)
> >>>>     * We need to reduce the time taken to CI
> >>>> * ...
> >>>
> >>> I'm afraid "green nightly CI" is more of an ideal than a reality given
> >>> the breadth and complexity of our fleet of CI jobs.  We still seem to
> >>> have stability problems in some areas (perhaps Acero?) but there are
> >>> also regularly regressions due to changes in third-party packages.
> >>>
> >>
> >> Would this still be true if executed against a maintenance release branch?
> >> I understand why this would drift for main/master, but if a version branch
> >> is green when first released, and only accepts limited, qualified
> >> backported fixes, it should be much easier to "keep" green, I'd think.
> >>
> >> Thanks,
> >>
> >> Todd
> >>
> >
>