You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@helix.apache.org by Brent <br...@gmail.com> on 2022/07/18 22:34:20 UTC

Backward-incompatible Zookeeper change in Helix v1.0.4

Hey Helix folks,

We ran into a fun issue recently.  Between the time that Apache Helix
v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it looks
like a backward-incompatible change may have been introduced on June 3rd
that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.

I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 (
https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so
obviously that certainly factors in, but it's what our organizational team
is supporting.  So unfortunately we're stuck between a rock and a hard
place at the moment:
- We can't go back to v1.0.2 because it lacks the Log4j fixes
- We can't use v1.0.3 due to the corruption issue
- We can't move ahead to v1.0.4 due to the compatibility issue with
Zookeeper
I have a fork we were previously using (
https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1),
but that's not a long-term solution either.

The issue is a bit subtle.  From v1.0.2 to v1.0.3, the org.apache.zookeeper
version requirement in the helix/zookeeper-api was bumped from 3.14.13 to
3.5.9:
- v1.0.2:
https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58
- v1.0.3:
https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54
So that, in and of itself, was not breaking.

And then from v1.0.3 to v1.0.4, some code changes were introduced in this
PR (https://github.com/apache/helix/pull/2138/files) that relied
specifically on that 3.5.x Zookeeper version.  For example, the "import
org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to
"helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java"
in that PR introduces a backward incompatible change.

So the net result is that, unfortunately, there has been a drift over the
past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper 3.4.x
clusters incompatible with Apache Helix.

I wanted to post this here:

1.  To see if you were all aware of it (since it may hit other customers as
well and we were a bit blind-sided by it)
2.  To see if you had any ideas on how to work with/around this

Our long-term plan will obviously be to get on newer Zookeeper clusters as
we can, but that's likely not going to be a quick turn-around for us.  In
the short-term we'll need to revert back to our v1.0.2 fork.

Does the team happen to have any other comments or suggestions on dealing
with this issue?  Is this correctable at the project level (I suspect that
will be tough)?

Thanks much!

~Brent

Re: Backward-incompatible Zookeeper change in Helix v1.0.4

Posted by Wang Jiajun <er...@gmail.com>.
If Helix components have not actually started using ttl, I believe it is
doable (although risky) to build Helix with newer ZK lib version and
connect to older ZK servers. Otherwise, if ttl is already used, then I
don't think there is a way to support older versions without creating a
parallel branch.

My feeling is that Helix internally does not need ttl for now (correct me
if I am wrong). In this case, we can keep the older ZK version as default,
but release a separate zookeeper-lib for the new ZK version for the
customers with needs.

Best Regards,
Jiajun


On Mon, Jul 18, 2022 at 4:27 PM Junkai Xue <jx...@apache.org> wrote:

> Thanks Brent for raising this concern! Previously, we were not aware of
> this issue of ZK level backward incompatibility.
>
> I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix
> to make it a hotfix. But I am not sure whether we can do a release for that
> as long as there is no build number version in Apache Helix.
>
> I added to the dev list to see whether there are any other suggestions for
> this scenario or not.
>
> Best,
>
> Junkai
>
> On Mon, Jul 18, 2022 at 3:34 PM Brent <br...@gmail.com> wrote:
>
> > Hey Helix folks,
> >
> > We ran into a fun issue recently.  Between the time that Apache Helix
> > v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it
> looks
> > like a backward-incompatible change may have been introduced on June 3rd
> > that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.
> >
> > I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 (
> > https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so
> > obviously that certainly factors in, but it's what our organizational
> team
> > is supporting.  So unfortunately we're stuck between a rock and a hard
> > place at the moment:
> > - We can't go back to v1.0.2 because it lacks the Log4j fixes
> > - We can't use v1.0.3 due to the corruption issue
> > - We can't move ahead to v1.0.4 due to the compatibility issue with
> > Zookeeper
> > I have a fork we were previously using (
> >
> https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1
> ),
> > but that's not a long-term solution either.
> >
> > The issue is a bit subtle.  From v1.0.2 to v1.0.3, the
> > org.apache.zookeeper version requirement in the helix/zookeeper-api was
> > bumped from 3.14.13 to 3.5.9:
> > - v1.0.2:
> >
> https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58
> > - v1.0.3:
> >
> https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54
> > So that, in and of itself, was not breaking.
> >
> > And then from v1.0.3 to v1.0.4, some code changes were introduced in this
> > PR (https://github.com/apache/helix/pull/2138/files) that relied
> > specifically on that 3.5.x Zookeeper version.  For example, the "import
> > org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to
> >
> "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java"
> > in that PR introduces a backward incompatible change.
> >
> > So the net result is that, unfortunately, there has been a drift over the
> > past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper
> 3.4.x
> > clusters incompatible with Apache Helix.
> >
> > I wanted to post this here:
> >
> > 1.  To see if you were all aware of it (since it may hit other customers
> > as well and we were a bit blind-sided by it)
> > 2.  To see if you had any ideas on how to work with/around this
> >
> > Our long-term plan will obviously be to get on newer Zookeeper clusters
> as
> > we can, but that's likely not going to be a quick turn-around for us.  In
> > the short-term we'll need to revert back to our v1.0.2 fork.
> >
> > Does the team happen to have any other comments or suggestions on dealing
> > with this issue?  Is this correctable at the project level (I suspect
> that
> > will be tough)?
> >
> > Thanks much!
> >
> > ~Brent
> >
>

Re: Backward-incompatible Zookeeper change in Helix v1.0.4

Posted by Wang Jiajun <er...@gmail.com>.
If Helix components have not actually started using ttl, I believe it is
doable (although risky) to build Helix with newer ZK lib version and
connect to older ZK servers. Otherwise, if ttl is already used, then I
don't think there is a way to support older versions without creating a
parallel branch.

My feeling is that Helix internally does not need ttl for now (correct me
if I am wrong). In this case, we can keep the older ZK version as default,
but release a separate zookeeper-lib for the new ZK version for the
customers with needs.

Best Regards,
Jiajun


On Mon, Jul 18, 2022 at 4:27 PM Junkai Xue <jx...@apache.org> wrote:

> Thanks Brent for raising this concern! Previously, we were not aware of
> this issue of ZK level backward incompatibility.
>
> I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix
> to make it a hotfix. But I am not sure whether we can do a release for that
> as long as there is no build number version in Apache Helix.
>
> I added to the dev list to see whether there are any other suggestions for
> this scenario or not.
>
> Best,
>
> Junkai
>
> On Mon, Jul 18, 2022 at 3:34 PM Brent <br...@gmail.com> wrote:
>
> > Hey Helix folks,
> >
> > We ran into a fun issue recently.  Between the time that Apache Helix
> > v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it
> looks
> > like a backward-incompatible change may have been introduced on June 3rd
> > that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.
> >
> > I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 (
> > https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so
> > obviously that certainly factors in, but it's what our organizational
> team
> > is supporting.  So unfortunately we're stuck between a rock and a hard
> > place at the moment:
> > - We can't go back to v1.0.2 because it lacks the Log4j fixes
> > - We can't use v1.0.3 due to the corruption issue
> > - We can't move ahead to v1.0.4 due to the compatibility issue with
> > Zookeeper
> > I have a fork we were previously using (
> >
> https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1
> ),
> > but that's not a long-term solution either.
> >
> > The issue is a bit subtle.  From v1.0.2 to v1.0.3, the
> > org.apache.zookeeper version requirement in the helix/zookeeper-api was
> > bumped from 3.14.13 to 3.5.9:
> > - v1.0.2:
> >
> https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58
> > - v1.0.3:
> >
> https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54
> > So that, in and of itself, was not breaking.
> >
> > And then from v1.0.3 to v1.0.4, some code changes were introduced in this
> > PR (https://github.com/apache/helix/pull/2138/files) that relied
> > specifically on that 3.5.x Zookeeper version.  For example, the "import
> > org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to
> >
> "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java"
> > in that PR introduces a backward incompatible change.
> >
> > So the net result is that, unfortunately, there has been a drift over the
> > past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper
> 3.4.x
> > clusters incompatible with Apache Helix.
> >
> > I wanted to post this here:
> >
> > 1.  To see if you were all aware of it (since it may hit other customers
> > as well and we were a bit blind-sided by it)
> > 2.  To see if you had any ideas on how to work with/around this
> >
> > Our long-term plan will obviously be to get on newer Zookeeper clusters
> as
> > we can, but that's likely not going to be a quick turn-around for us.  In
> > the short-term we'll need to revert back to our v1.0.2 fork.
> >
> > Does the team happen to have any other comments or suggestions on dealing
> > with this issue?  Is this correctable at the project level (I suspect
> that
> > will be tough)?
> >
> > Thanks much!
> >
> > ~Brent
> >
>

Re: Backward-incompatible Zookeeper change in Helix v1.0.4

Posted by Junkai Xue <jx...@apache.org>.
Thanks Brent for raising this concern! Previously, we were not aware of
this issue of ZK level backward incompatibility.

I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix
to make it a hotfix. But I am not sure whether we can do a release for that
as long as there is no build number version in Apache Helix.

I added to the dev list to see whether there are any other suggestions for
this scenario or not.

Best,

Junkai

On Mon, Jul 18, 2022 at 3:34 PM Brent <br...@gmail.com> wrote:

> Hey Helix folks,
>
> We ran into a fun issue recently.  Between the time that Apache Helix
> v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it looks
> like a backward-incompatible change may have been introduced on June 3rd
> that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.
>
> I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 (
> https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so
> obviously that certainly factors in, but it's what our organizational team
> is supporting.  So unfortunately we're stuck between a rock and a hard
> place at the moment:
> - We can't go back to v1.0.2 because it lacks the Log4j fixes
> - We can't use v1.0.3 due to the corruption issue
> - We can't move ahead to v1.0.4 due to the compatibility issue with
> Zookeeper
> I have a fork we were previously using (
> https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1),
> but that's not a long-term solution either.
>
> The issue is a bit subtle.  From v1.0.2 to v1.0.3, the
> org.apache.zookeeper version requirement in the helix/zookeeper-api was
> bumped from 3.14.13 to 3.5.9:
> - v1.0.2:
> https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58
> - v1.0.3:
> https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54
> So that, in and of itself, was not breaking.
>
> And then from v1.0.3 to v1.0.4, some code changes were introduced in this
> PR (https://github.com/apache/helix/pull/2138/files) that relied
> specifically on that 3.5.x Zookeeper version.  For example, the "import
> org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to
> "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java"
> in that PR introduces a backward incompatible change.
>
> So the net result is that, unfortunately, there has been a drift over the
> past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper 3.4.x
> clusters incompatible with Apache Helix.
>
> I wanted to post this here:
>
> 1.  To see if you were all aware of it (since it may hit other customers
> as well and we were a bit blind-sided by it)
> 2.  To see if you had any ideas on how to work with/around this
>
> Our long-term plan will obviously be to get on newer Zookeeper clusters as
> we can, but that's likely not going to be a quick turn-around for us.  In
> the short-term we'll need to revert back to our v1.0.2 fork.
>
> Does the team happen to have any other comments or suggestions on dealing
> with this issue?  Is this correctable at the project level (I suspect that
> will be tough)?
>
> Thanks much!
>
> ~Brent
>

Re: Backward-incompatible Zookeeper change in Helix v1.0.4

Posted by Junkai Xue <jx...@apache.org>.
Thanks Brent for raising this concern! Previously, we were not aware of
this issue of ZK level backward incompatibility.

I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix
to make it a hotfix. But I am not sure whether we can do a release for that
as long as there is no build number version in Apache Helix.

I added to the dev list to see whether there are any other suggestions for
this scenario or not.

Best,

Junkai

On Mon, Jul 18, 2022 at 3:34 PM Brent <br...@gmail.com> wrote:

> Hey Helix folks,
>
> We ran into a fun issue recently.  Between the time that Apache Helix
> v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it looks
> like a backward-incompatible change may have been introduced on June 3rd
> that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters.
>
> I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 (
> https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so
> obviously that certainly factors in, but it's what our organizational team
> is supporting.  So unfortunately we're stuck between a rock and a hard
> place at the moment:
> - We can't go back to v1.0.2 because it lacks the Log4j fixes
> - We can't use v1.0.3 due to the corruption issue
> - We can't move ahead to v1.0.4 due to the compatibility issue with
> Zookeeper
> I have a fork we were previously using (
> https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1),
> but that's not a long-term solution either.
>
> The issue is a bit subtle.  From v1.0.2 to v1.0.3, the
> org.apache.zookeeper version requirement in the helix/zookeeper-api was
> bumped from 3.14.13 to 3.5.9:
> - v1.0.2:
> https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58
> - v1.0.3:
> https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54
> So that, in and of itself, was not breaking.
>
> And then from v1.0.3 to v1.0.4, some code changes were introduced in this
> PR (https://github.com/apache/helix/pull/2138/files) that relied
> specifically on that 3.5.x Zookeeper version.  For example, the "import
> org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to
> "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java"
> in that PR introduces a backward incompatible change.
>
> So the net result is that, unfortunately, there has been a drift over the
> past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper 3.4.x
> clusters incompatible with Apache Helix.
>
> I wanted to post this here:
>
> 1.  To see if you were all aware of it (since it may hit other customers
> as well and we were a bit blind-sided by it)
> 2.  To see if you had any ideas on how to work with/around this
>
> Our long-term plan will obviously be to get on newer Zookeeper clusters as
> we can, but that's likely not going to be a quick turn-around for us.  In
> the short-term we'll need to revert back to our v1.0.2 fork.
>
> Does the team happen to have any other comments or suggestions on dealing
> with this issue?  Is this correctable at the project level (I suspect that
> will be tough)?
>
> Thanks much!
>
> ~Brent
>