You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by Junkai Xue <jx...@apache.org> on 2022/07/18 23:26:58 UTC

Re: Backward-incompatible Zookeeper change in Helix v1.0.4

Thanks Brent for raising this concern! Previously, we were not aware of
this issue of ZK level backward incompatibility.

I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix
to make it a hotfix. But I am not sure whether we can do a release for that
as long as there is no build number version in Apache Helix.

I added to the dev list to see whether there are any other suggestions for
this scenario or not.

Best,

Junkai

On Mon, Jul 18, 2022 at 3:34 PM Brent <br...@gmail.com> wrote:

> Hey Helix folks,
>
> We ran into a fun issue recently.  Between the time that Apache Helix
> v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it looks
> like a backward-incompatible change may have been introduced on June 3rd
> that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.
>
> I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 (
> https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so
> obviously that certainly factors in, but it's what our organizational team
> is supporting.  So unfortunately we're stuck between a rock and a hard
> place at the moment:
> - We can't go back to v1.0.2 because it lacks the Log4j fixes
> - We can't use v1.0.3 due to the corruption issue
> - We can't move ahead to v1.0.4 due to the compatibility issue with
> Zookeeper
> I have a fork we were previously using (
> https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1),
> but that's not a long-term solution either.
>
> The issue is a bit subtle.  From v1.0.2 to v1.0.3, the
> org.apache.zookeeper version requirement in the helix/zookeeper-api was
> bumped from 3.14.13 to 3.5.9:
> - v1.0.2:
> https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58
> - v1.0.3:
> https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54
> So that, in and of itself, was not breaking.
>
> And then from v1.0.3 to v1.0.4, some code changes were introduced in this
> PR (https://github.com/apache/helix/pull/2138/files) that relied
> specifically on that 3.5.x Zookeeper version.  For example, the "import
> org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to
> "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java"
> in that PR introduces a backward incompatible change.
>
> So the net result is that, unfortunately, there has been a drift over the
> past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper 3.4.x
> clusters incompatible with Apache Helix.
>
> I wanted to post this here:
>
> 1.  To see if you were all aware of it (since it may hit other customers
> as well and we were a bit blind-sided by it)
> 2.  To see if you had any ideas on how to work with/around this
>
> Our long-term plan will obviously be to get on newer Zookeeper clusters as
> we can, but that's likely not going to be a quick turn-around for us.  In
> the short-term we'll need to revert back to our v1.0.2 fork.
>
> Does the team happen to have any other comments or suggestions on dealing
> with this issue?  Is this correctable at the project level (I suspect that
> will be tough)?
>
> Thanks much!
>
> ~Brent
>

Re: Backward-incompatible Zookeeper change in Helix v1.0.4

Posted by Wang Jiajun <er...@gmail.com>.
If Helix components have not actually started using ttl, I believe it is
doable (although risky) to build Helix with newer ZK lib version and
connect to older ZK servers. Otherwise, if ttl is already used, then I
don't think there is a way to support older versions without creating a
parallel branch.

My feeling is that Helix internally does not need ttl for now (correct me
if I am wrong). In this case, we can keep the older ZK version as default,
but release a separate zookeeper-lib for the new ZK version for the
customers with needs.

Best Regards,
Jiajun


On Mon, Jul 18, 2022 at 4:27 PM Junkai Xue <jx...@apache.org> wrote:

> Thanks Brent for raising this concern! Previously, we were not aware of
> this issue of ZK level backward incompatibility.
>
> I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix
> to make it a hotfix. But I am not sure whether we can do a release for that
> as long as there is no build number version in Apache Helix.
>
> I added to the dev list to see whether there are any other suggestions for
> this scenario or not.
>
> Best,
>
> Junkai
>
> On Mon, Jul 18, 2022 at 3:34 PM Brent <br...@gmail.com> wrote:
>
> > Hey Helix folks,
> >
> > We ran into a fun issue recently.  Between the time that Apache Helix
> > v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it
> looks
> > like a backward-incompatible change may have been introduced on June 3rd
> > that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.
> >
> > I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 (
> > https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so
> > obviously that certainly factors in, but it's what our organizational
> team
> > is supporting.  So unfortunately we're stuck between a rock and a hard
> > place at the moment:
> > - We can't go back to v1.0.2 because it lacks the Log4j fixes
> > - We can't use v1.0.3 due to the corruption issue
> > - We can't move ahead to v1.0.4 due to the compatibility issue with
> > Zookeeper
> > I have a fork we were previously using (
> >
> https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1
> ),
> > but that's not a long-term solution either.
> >
> > The issue is a bit subtle.  From v1.0.2 to v1.0.3, the
> > org.apache.zookeeper version requirement in the helix/zookeeper-api was
> > bumped from 3.14.13 to 3.5.9:
> > - v1.0.2:
> >
> https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58
> > - v1.0.3:
> >
> https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54
> > So that, in and of itself, was not breaking.
> >
> > And then from v1.0.3 to v1.0.4, some code changes were introduced in this
> > PR (https://github.com/apache/helix/pull/2138/files) that relied
> > specifically on that 3.5.x Zookeeper version.  For example, the "import
> > org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to
> >
> "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java"
> > in that PR introduces a backward incompatible change.
> >
> > So the net result is that, unfortunately, there has been a drift over the
> > past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper
> 3.4.x
> > clusters incompatible with Apache Helix.
> >
> > I wanted to post this here:
> >
> > 1.  To see if you were all aware of it (since it may hit other customers
> > as well and we were a bit blind-sided by it)
> > 2.  To see if you had any ideas on how to work with/around this
> >
> > Our long-term plan will obviously be to get on newer Zookeeper clusters
> as
> > we can, but that's likely not going to be a quick turn-around for us.  In
> > the short-term we'll need to revert back to our v1.0.2 fork.
> >
> > Does the team happen to have any other comments or suggestions on dealing
> > with this issue?  Is this correctable at the project level (I suspect
> that
> > will be tough)?
> >
> > Thanks much!
> >
> > ~Brent
> >
>

Re: Backward-incompatible Zookeeper change in Helix v1.0.4

Posted by Wang Jiajun <er...@gmail.com>.
If Helix components have not actually started using ttl, I believe it is
doable (although risky) to build Helix with newer ZK lib version and
connect to older ZK servers. Otherwise, if ttl is already used, then I
don't think there is a way to support older versions without creating a
parallel branch.

My feeling is that Helix internally does not need ttl for now (correct me
if I am wrong). In this case, we can keep the older ZK version as default,
but release a separate zookeeper-lib for the new ZK version for the
customers with needs.

Best Regards,
Jiajun


On Mon, Jul 18, 2022 at 4:27 PM Junkai Xue <jx...@apache.org> wrote:

> Thanks Brent for raising this concern! Previously, we were not aware of
> this issue of ZK level backward incompatibility.
>
> I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix
> to make it a hotfix. But I am not sure whether we can do a release for that
> as long as there is no build number version in Apache Helix.
>
> I added to the dev list to see whether there are any other suggestions for
> this scenario or not.
>
> Best,
>
> Junkai
>
> On Mon, Jul 18, 2022 at 3:34 PM Brent <br...@gmail.com> wrote:
>
> > Hey Helix folks,
> >
> > We ran into a fun issue recently.  Between the time that Apache Helix
> > v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it
> looks
> > like a backward-incompatible change may have been introduced on June 3rd
> > that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.
> >
> > I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 (
> > https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so
> > obviously that certainly factors in, but it's what our organizational
> team
> > is supporting.  So unfortunately we're stuck between a rock and a hard
> > place at the moment:
> > - We can't go back to v1.0.2 because it lacks the Log4j fixes
> > - We can't use v1.0.3 due to the corruption issue
> > - We can't move ahead to v1.0.4 due to the compatibility issue with
> > Zookeeper
> > I have a fork we were previously using (
> >
> https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1
> ),
> > but that's not a long-term solution either.
> >
> > The issue is a bit subtle.  From v1.0.2 to v1.0.3, the
> > org.apache.zookeeper version requirement in the helix/zookeeper-api was
> > bumped from 3.14.13 to 3.5.9:
> > - v1.0.2:
> >
> https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58
> > - v1.0.3:
> >
> https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54
> > So that, in and of itself, was not breaking.
> >
> > And then from v1.0.3 to v1.0.4, some code changes were introduced in this
> > PR (https://github.com/apache/helix/pull/2138/files) that relied
> > specifically on that 3.5.x Zookeeper version.  For example, the "import
> > org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to
> >
> "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java"
> > in that PR introduces a backward incompatible change.
> >
> > So the net result is that, unfortunately, there has been a drift over the
> > past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper
> 3.4.x
> > clusters incompatible with Apache Helix.
> >
> > I wanted to post this here:
> >
> > 1.  To see if you were all aware of it (since it may hit other customers
> > as well and we were a bit blind-sided by it)
> > 2.  To see if you had any ideas on how to work with/around this
> >
> > Our long-term plan will obviously be to get on newer Zookeeper clusters
> as
> > we can, but that's likely not going to be a quick turn-around for us.  In
> > the short-term we'll need to revert back to our v1.0.2 fork.
> >
> > Does the team happen to have any other comments or suggestions on dealing
> > with this issue?  Is this correctable at the project level (I suspect
> that
> > will be tough)?
> >
> > Thanks much!
> >
> > ~Brent
> >
>