You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@bigtop.apache.org by Luca Toscano <to...@gmail.com> on 2020/07/15 13:07:09 UTC

Re: Testing rollback after HDFS upgrade

Hi everybody,

I didn't get the time to work on this until recently, but I finally
managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
and rollback if needed. The assumptions are:

1) It is ok to have (limited) cluster downtime.
2) Rolling upgrade is not needed.
3) QJM is used.

The procedure is listed in these two scripts:

https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py

The code is highly dependent on my working environment, but it should
be clear to follow when writing a tutorial about how to migrate from
CDH to Bigtop. All the suggestions given by this mailing list were
really useful to reach a solution!

My next steps will be:

1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
jobs, test Hive 2, etc..).
2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
(HDFS 2.6.0-cdh -> 2.8.5).
3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
4) Upgrade to Debian 10.

With automation it shouldn't be very difficult, I'll report progress once made.

Thanks a lot!

Luca

On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com> wrote:
>
> Hi Evans,
>
> thanks a lot for the feedback, it was exactly what I needed. The
> simpler the better is definitely a good advice in this use case, I'll
> try this week another rollout/rollback and report back :)
>
> Luca
>
> On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
> >
> > Hi Luca,
> >
> > Thanks for reporting back and let us know how it goes.
> > I don't have the exactly HDFS with QJM HA upgrade experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM HA, which was back in 2014.
> >
> > Regarding to rollback, I think you're right:
> >
> > it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
> >
> > My previous experience is the same that the rollback is merely a snapshot before the upgrade. If you've gone far, then rollback cost more data lost... Our runbook is if our sanity check failed during upgrade downtime, we perform the rollback immediately.
> >
> > Regarding to that FSImage hole issue, I've experienced it as well.
> > I managed to fix it by manually edit the FSImage with offline image viewer[1] and delete that missing editLog in FSImage. That actually brought my cluster back with a little number of missing blocks.
> >
> > Our experience says that the more the steps, the more the chance you failed the upgrade. We did good on dozen times of testing, DEV cluster, STAGING cluster, but still got missing blocks when upgrading Production...
> >
> > The suggestion is to get your production in good shape first(the less decommissioned, offline DNs, disk failures, the better).
> > Also, maybe you can switch to non-HA mode and do the upgrade to simplify the things?
> >
> > Not many helps but please let us know if any progress.
> > Last one, have you reached out to Hadoop community? the authors should know the most :)
> >
> > - Evans
> >
> > [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
> >
> > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
> >>
> >> Hi everybody,
> >>
> >> most of the bugs/issues/etc.. that I found while upgrading from CDH 5
> >> to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
> >> upgrade/rollback procedures for HDFS (all written in
> >> https://phabricator.wikimedia.org/T244499, will add documentation
> >> about this at the end I promise).
> >>
> >> I initially followed [1][2] in my Test cluster, choosing the Rolling
> >> upgrade, but when I tried to rollback (after days since the initial
> >> upgrade) I ended up in an inconsistent state and I wasn't able to
> >> recover the previous HDFS state. I didn't save the exact error
> >> messages but the situation was more or less the following:
> >>
> >> FS-Image-rollback (created at the time of the upgrade) - up to transaction X
> >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
> >> totally made up for the example)
> >> QJM cluster: first available transaction Z = X + 10000 + 1
> >>
> >> When I tried to rolling rollback, the Namenode complained about a hole
> >> in the transaction log, namely at X + 1, so it refused to start. I
> >> tried to force a regular rollback, but the Namenode refused again
> >> saying that there was no available FS Image to roll back to. I checked
> >> in the Hadoop code and indeed the Namenode saves the fs image with
> >> different naming/path in case of a rolling upgrade or a regular
> >> upgrade. Both cases make sense, especially the first one since there
> >> was indeed a hole between the last transaction of the
> >> FS-Image-rollback and the first available transaction to reply on the
> >> QJM cluster. I chose the rolling upgrade initially since it was
> >> appealing: it promises to bring back the Namenodes to their previous
> >> versions, but keeping the data modified between upgrade and rollback.
> >>
> >> I then found [3], in which it is said that with QJM everything is more
> >> complicated, and a regular rollback is the only option available. What
> >> I think this mean is that due to the Edit log spread among multiple
> >> nodes, a rollback that keeps data between upgrade and rollback is not
> >> available, so worst case scenario the data modified during that
> >> timeframe is lost. Not a big deal in my case, but I want to triple
> >> check with you if this is the correct interpretation or if there is
> >> another tutorial/guide/etc.. that I haven't read with a different
> >> procedure :)
> >>
> >> Is my interpretation correct? If not, is there anybody with experience
> >> in HDFS upgrades that could shed some light on the subject?
> >>
> >> Thanks in advance!
> >>
> >> Luca
> >>
> >>
> >>
> >> [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
> >> [2] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> >> [3] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Re: Testing rollback after HDFS upgrade

Posted by Luca Toscano <to...@gmail.com>.
Thanks a lot for the feedback, I really appreciated it :)

Luca

On Mon, Nov 23, 2020 at 6:18 PM Evans Ye <ev...@apache.org> wrote:
>
> When I was at TrendMicro, we did not backup the data as well.
> Since the upgrade itself is somehow duplicating the data in two different versions.
> I'd say FSImage backup is super important to make sure you can at least bring namenode back online. And it's cheap.
> Since I was at the role 4 years ago. Take the following with your own judgement...
>
> For scenario 1: Upgrade goes fine, but the hdfs finalize step fails
> AFAIK finalize is to cleanup previous blocks and yield the space onward. So I think this is ok.
>
> For scenario 2: Upgrade fails, rollback fails as well, and restoring a previous namenode fsimage is not enough
> Restoring the FSImage should be ok but you'll probably get several missing blocks due to inconsistent. You can only accept the data loss but the main portion of the Data Lake should be fine.
>
> For additional backup, I think it's your decision to weight the risk and your business. Let's say when upgrading, at a very specific moment that you just hit the upgrade button, suddenly the power/network is down and making entire cluster entering a wired state. The chance is rare, but no one can say.
>
> Additional suggestions:
> 1. classify your data into more fine-grained levels and only backup those super important ones.
> 2. If you have staging, dev clusters, conscript them to participate the upgrade. Use distcp to backup data between prod/stage/dev clusters to minimize the data recency gap.
>
>
>
> Luca Toscano <to...@gmail.com> 於 2020年11月23日 週一 下午5:35寫道:
>>
>> Hi everybody,
>>
>> I am currently struggling with a precautionary step before moving to
>> Bigtop, namely finding enough space on a temporary backup cluster
>> (separated from the one to upgrade) to save important data that my
>> team wouldn't be able to recover in case of HDFS failure (historical
>> data etc..). I have tested the upgrade and rollback several times
>> (from CDH to Bigtop), but the nightmare scenarios that I have in mind
>> are:
>>
>> - Upgrade goes fine, but the hdfs finalize step fails and leaves HDFS
>> inconsistent for some reason (datanodes' previous directory not
>> present anymore, etc..)
>> - Upgrade fails, rollback fails as well, and restoring a previous
>> namenode fsimage is not enough (inconsistent datanode state across the
>> cluster etc..)
>>
>> Having the absolute important data set aside on a separate cluster
>> (not involved in the upgrade) seems the right and more conservative
>> choice, but it is of course challenging when the data to backup spans
>> hundreds of terabytes. Are the above right concerns? Or are those too
>> much considering the risks of the upgrade? I've researched this a bit
>> and didn't find anybody backing up data (except for the Namenode's
>> metadata like fsimages etc..) before an HDFS upgrade.
>>
>> If anybody went through the same doubts and could give me some
>> feedback it would be really appreciated :)
>>
>> Thanks in advance,
>>
>> Luca
>>
>>
>>
>> On Mon, Sep 28, 2020 at 8:48 AM Evans Ye <ev...@apache.org> wrote:
>> >
>> > Oh ok. That sounds great!
>> >
>> > Luca Toscano <to...@gmail.com> 於 2020年9月28日 週一 14:31 寫道:
>> >>
>> >> Hi Evans,
>> >>
>> >> what I meant with a blog post shared would be something that goes in
>> >> http://techblog.wikimedia.org/ and on
>> >> https://blogs.apache.org/bigtop/, stating that we collaborated and how
>> >> :)
>> >>
>> >> Luca
>> >>
>> >> On Mon, Sep 21, 2020 at 5:44 PM Evans Ye <ev...@apache.org> wrote:
>> >> >
>> >> > Yes. Overall it sounds great to me!
>> >> >
>> >> > I think the  "summary of known pitfalls/bugs/etc.." section is worth to add and might be a super valuable part of the whole thing.
>> >> >
>> >> > | "The Blog post would be a good idea, maybe something that we can share between Wikimedia and Apache"
>> >> > What do you mean by this one, specifically? Currently 3 things we can in below. Do they match what you think or it's something else?
>> >> >
>> >> > 1. Bigtop wiki/blogs:
>> >> > https://cwiki.apache.org/confluence/display/BIGTOP/Index
>> >> > https://blogs.apache.org/bigtop/
>> >> >
>> >> > 2. Success At Apache:
>> >> > https://blogs.apache.org/foundation/category/SuccessAtApache
>> >> >
>> >> > 3. ApacheCon Talk (this year CFP is over, we can do it next year as a post production expereince sharing)
>> >> > https://apachecon.com/index.html
>> >> >
>> >> > - Evans
>> >> >
>> >> >
>> >> > Luca Toscano <to...@gmail.com> 於 2020年9月20日 週日 下午4:55寫道:
>> >> >>
>> >> >> Hi Evans,
>> >> >>
>> >> >> I am late in answering as well :)
>> >> >>
>> >> >> I thought about it and I think that with the right premises (example:
>> >> >> this is tailored for Wikimedia's environment, it assumes that a
>> >> >> cluster downtime is acceptable, etc..) the storytelling style might be
>> >> >> more easy to digest than a list of steps to follow. I think that in
>> >> >> all use cases different from Wikimedia there will be adjustments to
>> >> >> make, and things that work/don't-work/etc.. One thing that it might be
>> >> >> good to add at the end is a "summary of known pitfalls/bugs/etc.."
>> >> >> found during the procedure, that in my case were the most
>> >> >> time-consuming ones. I'll add it during the next few days and people
>> >> >> can comment :)
>> >> >>
>> >> >> The Blog post would be a good idea, maybe something that we can share
>> >> >> between Wikimedia and Apache? I am planning to move to BigTop during
>> >> >> the upcoming quarter (October -> December), that will also show if my
>> >> >> procedure works on a cluster of 60+ nodes (rather than on a small one
>> >> >> of 8 nodes) :D. As soon as I have done it I'll follow up with this
>> >> >> list so organize a blog post, does it sound ok?
>> >> >>
>> >> >> Thanks a lot for all the support!
>> >> >>
>> >> >> Luca
>> >> >>
>> >> >> On Tue, Sep 15, 2020 at 6:06 PM Evans Ye <ev...@apache.org> wrote:
>> >> >> >
>> >> >> > Hey Luca,
>> >> >> >
>> >> >> > Sorry for the late reply. I was busy for a conference. It's just over now.
>> >> >> > Anyway, I  think the writing is pretty informative. But it's more like a storytelling style. Also several contents are WikiMedia specific things. That's why I think it's more suitable for a blogpost.
>> >> >> >
>> >> >> > Anyhow, I think either way it's great content. If we keep it as is, I think we can make it available on Bigtop's WIKI & Blog, or even Success at Apache with the title like "WikiMedia's story to migrate from CDH to Bigtop". If you want to make it more like an official guide, the title will be "CDH to Bigtop Migration Guide". We can state the limitation  and environment so that people can take it w/ a caution that it might not suit their own environment.
>> >> >> >
>> >> >> > Which way to go depends on how much effort you'd like to take. Let me know what you think so that we can move forward.
>> >> >> >
>> >> >> > - Evans
>> >> >> >
>> >> >> > Luca Toscano <to...@gmail.com> 於 2020年9月7日 週一 下午3:39寫道:
>> >> >> >>
>> >> >> >> Hi Evans,
>> >> >> >>
>> >> >> >> thanks for the review! What are the things that you'd like to see to
>> >> >> >> make them more consumable for users? I can re-shape the writing, I
>> >> >> >> tried to come up with something to kick off a conversation with the
>> >> >> >> community, it would be interesting to know if anybody else has a
>> >> >> >> similar use case and how/if they are working on a solution.
>> >> >> >>
>> >> >> >> For the blogpost, maybe we can coordinate something shared between
>> >> >> >> Apache and Wikimedia when the migration is done, I am sure it would be
>> >> >> >> a nice example of the two Foundations collaborating :)
>> >> >> >>
>> >> >> >> Luca
>> >> >> >>
>> >> >> >> On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <ev...@apache.org> wrote:
>> >> >> >> >
>> >> >> >> > Hi Luca,
>> >> >> >> >
>> >> >> >> > I read through the doc briefly. I think the doc works very well as a blogpost of a successful story for Wikimedia migrating from CDH to Bigtop. However, the current writing doesn't seem to be easily consumable for users' who are just looking into the solutions/steps for doing similar migrations. May I know what title you would prefer if we put the doc in Bigtop's wiki?
>> >> >> >> >
>> >> >> >> > What I was thinking is the cookbook for migration. But we can discuss this. IMHO a Success at Apache[1] blogpost is also possible. But I need to figure out who to talk to. Let me know what you think.
>> >> >> >> >
>> >> >> >> > [1] https://blogs.apache.org/foundation/category/SuccessAtApache
>> >> >> >> >
>> >> >> >> > Evans
>> >> >> >> >
>> >> >> >> > Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:
>> >> >> >> >>
>> >> >> >> >> Hi Luca,
>> >> >> >> >>
>> >> >> >> >> I'm on vacation hence do not have time for review right now. I'll get back to you next week.
>> >> >> >> >>
>> >> >> >> >> The doc is definitely valuable. Once you have your production migrated successfully. We can prove to the other users that this is a battle proven solution. Even more, we can give a talk at ApacheCon or somewhere else to further amplify the impact of the work. This is definitely an open source winning case so I think it deserve a talk.
>> >> >> >> >>
>> >> >> >> >> Evans
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:
>> >> >> >> >>>
>> >> >> >> >>> Hi Evans,
>> >> >> >> >>>
>> >> >> >> >>> it took a while I know but I have the first version of the gdoc for the upgrade:
>> >> >> >> >>>
>> >> >> >> >>> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
>> >> >> >> >>>
>> >> >> >> >>> I tried to list all the steps involved in migrating from CDH 5 to
>> >> >> >> >>> Bigtop 1.4, anybody interested should be able to comment. The idea
>> >> >> >> >>> that I have is to discuss this for a few days and then possibly make
>> >> >> >> >>> it permanent somewhere in the Bigtop wiki? (of course if the document
>> >> >> >> >>> will be considered useful for others etc..)
>> >> >> >> >>>
>> >> >> >> >>> During these days I tested the procedure multiple times, and I have
>> >> >> >> >>> also tested the HDFS finalize step, everything works as expected. I
>> >> >> >> >>> hope to be able to move to Bigtop during the next couple of months.
>> >> >> >> >>>
>> >> >> >> >>> Luca
>> >> >> >> >>>
>> >> >> >> >>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org> wrote:
>> >> >> >> >>> >
>> >> >> >> >>> > Yes. I think a shared gdoc is prefered, and you can open up a JIRA ticket to track it.
>> >> >> >> >>> >
>> >> >> >> >>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
>> >> >> >> >>> >>
>> >> >> >> >>> >> Hi Evans!
>> >> >> >> >>> >>
>> >> >> >> >>> >> What is the best medium to use for the documentation/comments ? A
>> >> >> >> >>> >> shared gdoc or something similar?
>> >> >> >> >>> >>
>> >> >> >> >>> >> Luca
>> >> >> >> >>> >>
>> >> >> >> >>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
>> >> >> >> >>> >> >
>> >> >> >> >>> >> > One thing I think would be great to have is a doc version of the steps for upgrade and rollback. The benefits:
>> >> >> >> >>> >> > 1. Anything unexpected happened during automation, you do have folks can quickly understand what's going on and get into the investigation.
>> >> >> >> >>> >> > 2. Share the doc with us to help the others OSS users for doing the migration. For the env specific things I think that's fine. We can left comment on it. At least all the other users can get a high level view of a proven solution. And then they can go and find out the rest of the pieces by themselves.
>> >> >> >> >>> >> >
>> >> >> >> >>> >> > For automations, I suggest to split up the automation into several stages, and apply some validation steps(manually is ok) before kicking of the next stage.
>> >> >> >> >>> >> >
>> >> >> >> >>> >> > Best,
>> >> >> >> >>> >> > Evans
>> >> >> >> >>> >> >
>> >> >> >> >>> >> >
>> >> >> >> >>> >> >
>> >> >> >> >>> >> >
>> >> >> >> >>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> Hi everybody,
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> I didn't get the time to work on this until recently, but I finally
>> >> >> >> >>> >> >> managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
>> >> >> >> >>> >> >> and rollback if needed. The assumptions are:
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> 1) It is ok to have (limited) cluster downtime.
>> >> >> >> >>> >> >> 2) Rolling upgrade is not needed.
>> >> >> >> >>> >> >> 3) QJM is used.
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> The procedure is listed in these two scripts:
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>> >> >> >> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> The code is highly dependent on my working environment, but it should
>> >> >> >> >>> >> >> be clear to follow when writing a tutorial about how to migrate from
>> >> >> >> >>> >> >> CDH to Bigtop. All the suggestions given by this mailing list were
>> >> >> >> >>> >> >> really useful to reach a solution!
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> My next steps will be:
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
>> >> >> >> >>> >> >> jobs, test Hive 2, etc..).
>> >> >> >> >>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
>> >> >> >> >>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
>> >> >> >> >>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
>> >> >> >> >>> >> >> 4) Upgrade to Debian 10.
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> With automation it shouldn't be very difficult, I'll report progress once made.
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> Thanks a lot!
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> Luca
>> >> >> >> >>> >> >>
>> >> >> >> >>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com> wrote:
>> >> >> >> >>> >> >> >
>> >> >> >> >>> >> >> > Hi Evans,
>> >> >> >> >>> >> >> >
>> >> >> >> >>> >> >> > thanks a lot for the feedback, it was exactly what I needed. The
>> >> >> >> >>> >> >> > simpler the better is definitely a good advice in this use case, I'll
>> >> >> >> >>> >> >> > try this week another rollout/rollback and report back :)
>> >> >> >> >>> >> >> >
>> >> >> >> >>> >> >> > Luca
>> >> >> >> >>> >> >> >
>> >> >> >> >>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > Hi Luca,
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > Thanks for reporting back and let us know how it goes.
>> >> >> >> >>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM HA, which was back in 2014.
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > Regarding to rollback, I think you're right:
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > My previous experience is the same that the rollback is merely a snapshot before the upgrade. If you've gone far, then rollback cost more data lost... Our runbook is if our sanity check failed during upgrade downtime, we perform the rollback immediately.
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > Regarding to that FSImage hole issue, I've experienced it as well.
>> >> >> >> >>> >> >> > > I managed to fix it by manually edit the FSImage with offline image viewer[1] and delete that missing editLog in FSImage. That actually brought my cluster back with a little number of missing blocks.
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > Our experience says that the more the steps, the more the chance you failed the upgrade. We did good on dozen times of testing, DEV cluster, STAGING cluster, but still got missing blocks when upgrading Production...
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > The suggestion is to get your production in good shape first(the less decommissioned, offline DNs, disk failures, the better).
>> >> >> >> >>> >> >> > > Also, maybe you can switch to non-HA mode and do the upgrade to simplify the things?
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > Not many helps but please let us know if any progress.
>> >> >> >> >>> >> >> > > Last one, have you reached out to Hadoop community? the authors should know the most :)
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > - Evans
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
>> >> >> >> >>> >> >> > >
>> >> >> >> >>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> Hi everybody,
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> most of the bugs/issues/etc.. that I found while upgrading from CDH 5
>> >> >> >> >>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
>> >> >> >> >>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
>> >> >> >> >>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add documentation
>> >> >> >> >>> >> >> > >> about this at the end I promise).
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> I initially followed [1][2] in my Test cluster, choosing the Rolling
>> >> >> >> >>> >> >> > >> upgrade, but when I tried to rollback (after days since the initial
>> >> >> >> >>> >> >> > >> upgrade) I ended up in an inconsistent state and I wasn't able to
>> >> >> >> >>> >> >> > >> recover the previous HDFS state. I didn't save the exact error
>> >> >> >> >>> >> >> > >> messages but the situation was more or less the following:
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> FS-Image-rollback (created at the time of the upgrade) - up to transaction X
>> >> >> >> >>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
>> >> >> >> >>> >> >> > >> totally made up for the example)
>> >> >> >> >>> >> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> When I tried to rolling rollback, the Namenode complained about a hole
>> >> >> >> >>> >> >> > >> in the transaction log, namely at X + 1, so it refused to start. I
>> >> >> >> >>> >> >> > >> tried to force a regular rollback, but the Namenode refused again
>> >> >> >> >>> >> >> > >> saying that there was no available FS Image to roll back to. I checked
>> >> >> >> >>> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs image with
>> >> >> >> >>> >> >> > >> different naming/path in case of a rolling upgrade or a regular
>> >> >> >> >>> >> >> > >> upgrade. Both cases make sense, especially the first one since there
>> >> >> >> >>> >> >> > >> was indeed a hole between the last transaction of the
>> >> >> >> >>> >> >> > >> FS-Image-rollback and the first available transaction to reply on the
>> >> >> >> >>> >> >> > >> QJM cluster. I chose the rolling upgrade initially since it was
>> >> >> >> >>> >> >> > >> appealing: it promises to bring back the Namenodes to their previous
>> >> >> >> >>> >> >> > >> versions, but keeping the data modified between upgrade and rollback.
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> I then found [3], in which it is said that with QJM everything is more
>> >> >> >> >>> >> >> > >> complicated, and a regular rollback is the only option available. What
>> >> >> >> >>> >> >> > >> I think this mean is that due to the Edit log spread among multiple
>> >> >> >> >>> >> >> > >> nodes, a rollback that keeps data between upgrade and rollback is not
>> >> >> >> >>> >> >> > >> available, so worst case scenario the data modified during that
>> >> >> >> >>> >> >> > >> timeframe is lost. Not a big deal in my case, but I want to triple
>> >> >> >> >>> >> >> > >> check with you if this is the correct interpretation or if there is
>> >> >> >> >>> >> >> > >> another tutorial/guide/etc.. that I haven't read with a different
>> >> >> >> >>> >> >> > >> procedure :)
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> Is my interpretation correct? If not, is there anybody with experience
>> >> >> >> >>> >> >> > >> in HDFS upgrades that could shed some light on the subject?
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> Thanks in advance!
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> Luca
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >>
>> >> >> >> >>> >> >> > >> [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
>> >> >> >> >>> >> >> > >> [2] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>> >> >> >> >>> >> >> > >> [3] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Re: Testing rollback after HDFS upgrade

Posted by Evans Ye <ev...@apache.org>.
When I was at TrendMicro, we did not backup the data as well.
Since the upgrade itself is somehow duplicating the data in two different
versions.
I'd say FSImage backup is super important to make sure you can at least
bring namenode back online. And it's cheap.
Since I was at the role 4 years ago. Take the following with your own
judgement...

For scenario 1: Upgrade goes fine, but the hdfs finalize step fails
AFAIK finalize is to cleanup previous blocks and yield the space onward. So
I think this is ok.

For scenario 2: Upgrade fails, rollback fails as well, and restoring a
previous namenode fsimage is not enough
Restoring the FSImage should be ok but you'll probably get several missing
blocks due to inconsistent. You can only accept the data loss but the main
portion of the Data Lake should be fine.

For additional backup, I think it's your decision to weight the risk and
your business. Let's say when upgrading, at a very specific moment that you
just hit the upgrade button, suddenly the power/network is down and making
entire cluster entering a wired state. The chance is rare, but no one can
say.

Additional suggestions:
1. classify your data into more fine-grained levels and only backup those
super important ones.
2. If you have staging, dev clusters, conscript them to participate the
upgrade. Use distcp to backup data between prod/stage/dev clusters to
minimize the data recency gap.



Luca Toscano <to...@gmail.com> 於 2020年11月23日 週一 下午5:35寫道:

> Hi everybody,
>
> I am currently struggling with a precautionary step before moving to
> Bigtop, namely finding enough space on a temporary backup cluster
> (separated from the one to upgrade) to save important data that my
> team wouldn't be able to recover in case of HDFS failure (historical
> data etc..). I have tested the upgrade and rollback several times
> (from CDH to Bigtop), but the nightmare scenarios that I have in mind
> are:
>
> - Upgrade goes fine, but the hdfs finalize step fails and leaves HDFS
> inconsistent for some reason (datanodes' previous directory not
> present anymore, etc..)
> - Upgrade fails, rollback fails as well, and restoring a previous
> namenode fsimage is not enough (inconsistent datanode state across the
> cluster etc..)
>
> Having the absolute important data set aside on a separate cluster
> (not involved in the upgrade) seems the right and more conservative
> choice, but it is of course challenging when the data to backup spans
> hundreds of terabytes. Are the above right concerns? Or are those too
> much considering the risks of the upgrade? I've researched this a bit
> and didn't find anybody backing up data (except for the Namenode's
> metadata like fsimages etc..) before an HDFS upgrade.
>
> If anybody went through the same doubts and could give me some
> feedback it would be really appreciated :)
>
> Thanks in advance,
>
> Luca
>
>
>
> On Mon, Sep 28, 2020 at 8:48 AM Evans Ye <ev...@apache.org> wrote:
> >
> > Oh ok. That sounds great!
> >
> > Luca Toscano <to...@gmail.com> 於 2020年9月28日 週一 14:31 寫道:
> >>
> >> Hi Evans,
> >>
> >> what I meant with a blog post shared would be something that goes in
> >> http://techblog.wikimedia.org/ and on
> >> https://blogs.apache.org/bigtop/, stating that we collaborated and how
> >> :)
> >>
> >> Luca
> >>
> >> On Mon, Sep 21, 2020 at 5:44 PM Evans Ye <ev...@apache.org> wrote:
> >> >
> >> > Yes. Overall it sounds great to me!
> >> >
> >> > I think the  "summary of known pitfalls/bugs/etc.." section is worth
> to add and might be a super valuable part of the whole thing.
> >> >
> >> > | "The Blog post would be a good idea, maybe something that we can
> share between Wikimedia and Apache"
> >> > What do you mean by this one, specifically? Currently 3 things we can
> in below. Do they match what you think or it's something else?
> >> >
> >> > 1. Bigtop wiki/blogs:
> >> > https://cwiki.apache.org/confluence/display/BIGTOP/Index
> >> > https://blogs.apache.org/bigtop/
> >> >
> >> > 2. Success At Apache:
> >> > https://blogs.apache.org/foundation/category/SuccessAtApache
> >> >
> >> > 3. ApacheCon Talk (this year CFP is over, we can do it next year as a
> post production expereince sharing)
> >> > https://apachecon.com/index.html
> >> >
> >> > - Evans
> >> >
> >> >
> >> > Luca Toscano <to...@gmail.com> 於 2020年9月20日 週日 下午4:55寫道:
> >> >>
> >> >> Hi Evans,
> >> >>
> >> >> I am late in answering as well :)
> >> >>
> >> >> I thought about it and I think that with the right premises (example:
> >> >> this is tailored for Wikimedia's environment, it assumes that a
> >> >> cluster downtime is acceptable, etc..) the storytelling style might
> be
> >> >> more easy to digest than a list of steps to follow. I think that in
> >> >> all use cases different from Wikimedia there will be adjustments to
> >> >> make, and things that work/don't-work/etc.. One thing that it might
> be
> >> >> good to add at the end is a "summary of known pitfalls/bugs/etc.."
> >> >> found during the procedure, that in my case were the most
> >> >> time-consuming ones. I'll add it during the next few days and people
> >> >> can comment :)
> >> >>
> >> >> The Blog post would be a good idea, maybe something that we can share
> >> >> between Wikimedia and Apache? I am planning to move to BigTop during
> >> >> the upcoming quarter (October -> December), that will also show if my
> >> >> procedure works on a cluster of 60+ nodes (rather than on a small one
> >> >> of 8 nodes) :D. As soon as I have done it I'll follow up with this
> >> >> list so organize a blog post, does it sound ok?
> >> >>
> >> >> Thanks a lot for all the support!
> >> >>
> >> >> Luca
> >> >>
> >> >> On Tue, Sep 15, 2020 at 6:06 PM Evans Ye <ev...@apache.org> wrote:
> >> >> >
> >> >> > Hey Luca,
> >> >> >
> >> >> > Sorry for the late reply. I was busy for a conference. It's just
> over now.
> >> >> > Anyway, I  think the writing is pretty informative. But it's more
> like a storytelling style. Also several contents are WikiMedia specific
> things. That's why I think it's more suitable for a blogpost.
> >> >> >
> >> >> > Anyhow, I think either way it's great content. If we keep it as
> is, I think we can make it available on Bigtop's WIKI & Blog, or even
> Success at Apache with the title like "WikiMedia's story to migrate from
> CDH to Bigtop". If you want to make it more like an official guide, the
> title will be "CDH to Bigtop Migration Guide". We can state the limitation
> and environment so that people can take it w/ a caution that it might not
> suit their own environment.
> >> >> >
> >> >> > Which way to go depends on how much effort you'd like to take. Let
> me know what you think so that we can move forward.
> >> >> >
> >> >> > - Evans
> >> >> >
> >> >> > Luca Toscano <to...@gmail.com> 於 2020年9月7日 週一 下午3:39寫道:
> >> >> >>
> >> >> >> Hi Evans,
> >> >> >>
> >> >> >> thanks for the review! What are the things that you'd like to see
> to
> >> >> >> make them more consumable for users? I can re-shape the writing, I
> >> >> >> tried to come up with something to kick off a conversation with
> the
> >> >> >> community, it would be interesting to know if anybody else has a
> >> >> >> similar use case and how/if they are working on a solution.
> >> >> >>
> >> >> >> For the blogpost, maybe we can coordinate something shared between
> >> >> >> Apache and Wikimedia when the migration is done, I am sure it
> would be
> >> >> >> a nice example of the two Foundations collaborating :)
> >> >> >>
> >> >> >> Luca
> >> >> >>
> >> >> >> On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <ev...@apache.org>
> wrote:
> >> >> >> >
> >> >> >> > Hi Luca,
> >> >> >> >
> >> >> >> > I read through the doc briefly. I think the doc works very well
> as a blogpost of a successful story for Wikimedia migrating from CDH to
> Bigtop. However, the current writing doesn't seem to be easily consumable
> for users' who are just looking into the solutions/steps for doing similar
> migrations. May I know what title you would prefer if we put the doc in
> Bigtop's wiki?
> >> >> >> >
> >> >> >> > What I was thinking is the cookbook for migration. But we can
> discuss this. IMHO a Success at Apache[1] blogpost is also possible. But I
> need to figure out who to talk to. Let me know what you think.
> >> >> >> >
> >> >> >> > [1]
> https://blogs.apache.org/foundation/category/SuccessAtApache
> >> >> >> >
> >> >> >> > Evans
> >> >> >> >
> >> >> >> > Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:
> >> >> >> >>
> >> >> >> >> Hi Luca,
> >> >> >> >>
> >> >> >> >> I'm on vacation hence do not have time for review right now.
> I'll get back to you next week.
> >> >> >> >>
> >> >> >> >> The doc is definitely valuable. Once you have your production
> migrated successfully. We can prove to the other users that this is a
> battle proven solution. Even more, we can give a talk at ApacheCon or
> somewhere else to further amplify the impact of the work. This is
> definitely an open source winning case so I think it deserve a talk.
> >> >> >> >>
> >> >> >> >> Evans
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四
> 下午9:11寫道:
> >> >> >> >>>
> >> >> >> >>> Hi Evans,
> >> >> >> >>>
> >> >> >> >>> it took a while I know but I have the first version of the
> gdoc for the upgrade:
> >> >> >> >>>
> >> >> >> >>>
> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
> >> >> >> >>>
> >> >> >> >>> I tried to list all the steps involved in migrating from CDH
> 5 to
> >> >> >> >>> Bigtop 1.4, anybody interested should be able to comment. The
> idea
> >> >> >> >>> that I have is to discuss this for a few days and then
> possibly make
> >> >> >> >>> it permanent somewhere in the Bigtop wiki? (of course if the
> document
> >> >> >> >>> will be considered useful for others etc..)
> >> >> >> >>>
> >> >> >> >>> During these days I tested the procedure multiple times, and
> I have
> >> >> >> >>> also tested the HDFS finalize step, everything works as
> expected. I
> >> >> >> >>> hope to be able to move to Bigtop during the next couple of
> months.
> >> >> >> >>>
> >> >> >> >>> Luca
> >> >> >> >>>
> >> >> >> >>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org>
> wrote:
> >> >> >> >>> >
> >> >> >> >>> > Yes. I think a shared gdoc is prefered, and you can open up
> a JIRA ticket to track it.
> >> >> >> >>> >
> >> >> >> >>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一
> 21:10 寫道:
> >> >> >> >>> >>
> >> >> >> >>> >> Hi Evans!
> >> >> >> >>> >>
> >> >> >> >>> >> What is the best medium to use for the
> documentation/comments ? A
> >> >> >> >>> >> shared gdoc or something similar?
> >> >> >> >>> >>
> >> >> >> >>> >> Luca
> >> >> >> >>> >>
> >> >> >> >>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <
> evansye@apache.org> wrote:
> >> >> >> >>> >> >
> >> >> >> >>> >> > One thing I think would be great to have is a doc
> version of the steps for upgrade and rollback. The benefits:
> >> >> >> >>> >> > 1. Anything unexpected happened during automation, you
> do have folks can quickly understand what's going on and get into the
> investigation.
> >> >> >> >>> >> > 2. Share the doc with us to help the others OSS users
> for doing the migration. For the env specific things I think that's fine.
> We can left comment on it. At least all the other users can get a high
> level view of a proven solution. And then they can go and find out the rest
> of the pieces by themselves.
> >> >> >> >>> >> >
> >> >> >> >>> >> > For automations, I suggest to split up the automation
> into several stages, and apply some validation steps(manually is ok) before
> kicking of the next stage.
> >> >> >> >>> >> >
> >> >> >> >>> >> > Best,
> >> >> >> >>> >> > Evans
> >> >> >> >>> >> >
> >> >> >> >>> >> >
> >> >> >> >>> >> >
> >> >> >> >>> >> >
> >> >> >> >>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三
> 下午9:07寫道:
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Hi everybody,
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> I didn't get the time to work on this until recently,
> but I finally
> >> >> >> >>> >> >> managed to have a reliable procedure to upgrade from
> CDH to Bigtop 1.4
> >> >> >> >>> >> >> and rollback if needed. The assumptions are:
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> 1) It is ok to have (limited) cluster downtime.
> >> >> >> >>> >> >> 2) Rolling upgrade is not needed.
> >> >> >> >>> >> >> 3) QJM is used.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> The procedure is listed in these two scripts:
> >> >> >> >>> >> >>
> >> >> >> >>> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
> >> >> >> >>> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> The code is highly dependent on my working environment,
> but it should
> >> >> >> >>> >> >> be clear to follow when writing a tutorial about how to
> migrate from
> >> >> >> >>> >> >> CDH to Bigtop. All the suggestions given by this
> mailing list were
> >> >> >> >>> >> >> really useful to reach a solution!
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> My next steps will be:
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run
> more hadoop
> >> >> >> >>> >> >> jobs, test Hive 2, etc..).
> >> >> >> >>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4
> on Debian 9
> >> >> >> >>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
> >> >> >> >>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 ->
> 2.10).
> >> >> >> >>> >> >> 4) Upgrade to Debian 10.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> With automation it shouldn't be very difficult, I'll
> report progress once made.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Thanks a lot!
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Luca
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <
> toscano.luca@gmail.com> wrote:
> >> >> >> >>> >> >> >
> >> >> >> >>> >> >> > Hi Evans,
> >> >> >> >>> >> >> >
> >> >> >> >>> >> >> > thanks a lot for the feedback, it was exactly what I
> needed. The
> >> >> >> >>> >> >> > simpler the better is definitely a good advice in
> this use case, I'll
> >> >> >> >>> >> >> > try this week another rollout/rollback and report
> back :)
> >> >> >> >>> >> >> >
> >> >> >> >>> >> >> > Luca
> >> >> >> >>> >> >> >
> >> >> >> >>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <
> evansye@apache.org> wrote:
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > Hi Luca,
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > Thanks for reporting back and let us know how it
> goes.
> >> >> >> >>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade
> experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and
> then enable QJM HA, which was back in 2014.
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > Regarding to rollback, I think you're right:
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > it is possible to rollback to HDFS’ state before
> the upgrade in case of unexpected problems.
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > My previous experience is the same that the
> rollback is merely a snapshot before the upgrade. If you've gone far, then
> rollback cost more data lost... Our runbook is if our sanity check failed
> during upgrade downtime, we perform the rollback immediately.
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > Regarding to that FSImage hole issue, I've
> experienced it as well.
> >> >> >> >>> >> >> > > I managed to fix it by manually edit the FSImage
> with offline image viewer[1] and delete that missing editLog in FSImage.
> That actually brought my cluster back with a little number of missing
> blocks.
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > Our experience says that the more the steps, the
> more the chance you failed the upgrade. We did good on dozen times of
> testing, DEV cluster, STAGING cluster, but still got missing blocks when
> upgrading Production...
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > The suggestion is to get your production in good
> shape first(the less decommissioned, offline DNs, disk failures, the
> better).
> >> >> >> >>> >> >> > > Also, maybe you can switch to non-HA mode and do
> the upgrade to simplify the things?
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > Not many helps but please let us know if any
> progress.
> >> >> >> >>> >> >> > > Last one, have you reached out to Hadoop community?
> the authors should know the most :)
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > - Evans
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
> >> >> >> >>> >> >> > >
> >> >> >> >>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日
> 週三 21:03 寫道:
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> Hi everybody,
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> most of the bugs/issues/etc.. that I found while
> upgrading from CDH 5
> >> >> >> >>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as
> suggested also in here)
> >> >> >> >>> >> >> > >> upgrade/rollback procedures for HDFS (all written
> in
> >> >> >> >>> >> >> > >> https://phabricator.wikimedia.org/T244499, will
> add documentation
> >> >> >> >>> >> >> > >> about this at the end I promise).
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> I initially followed [1][2] in my Test cluster,
> choosing the Rolling
> >> >> >> >>> >> >> > >> upgrade, but when I tried to rollback (after days
> since the initial
> >> >> >> >>> >> >> > >> upgrade) I ended up in an inconsistent state and I
> wasn't able to
> >> >> >> >>> >> >> > >> recover the previous HDFS state. I didn't save the
> exact error
> >> >> >> >>> >> >> > >> messages but the situation was more or less the
> following:
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> FS-Image-rollback (created at the time of the
> upgrade) - up to transaction X
> >> >> >> >>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X
> + 10000 (number
> >> >> >> >>> >> >> > >> totally made up for the example)
> >> >> >> >>> >> >> > >> QJM cluster: first available transaction Z = X +
> 10000 + 1
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> When I tried to rolling rollback, the Namenode
> complained about a hole
> >> >> >> >>> >> >> > >> in the transaction log, namely at X + 1, so it
> refused to start. I
> >> >> >> >>> >> >> > >> tried to force a regular rollback, but the
> Namenode refused again
> >> >> >> >>> >> >> > >> saying that there was no available FS Image to
> roll back to. I checked
> >> >> >> >>> >> >> > >> in the Hadoop code and indeed the Namenode saves
> the fs image with
> >> >> >> >>> >> >> > >> different naming/path in case of a rolling upgrade
> or a regular
> >> >> >> >>> >> >> > >> upgrade. Both cases make sense, especially the
> first one since there
> >> >> >> >>> >> >> > >> was indeed a hole between the last transaction of
> the
> >> >> >> >>> >> >> > >> FS-Image-rollback and the first available
> transaction to reply on the
> >> >> >> >>> >> >> > >> QJM cluster. I chose the rolling upgrade initially
> since it was
> >> >> >> >>> >> >> > >> appealing: it promises to bring back the Namenodes
> to their previous
> >> >> >> >>> >> >> > >> versions, but keeping the data modified between
> upgrade and rollback.
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> I then found [3], in which it is said that with
> QJM everything is more
> >> >> >> >>> >> >> > >> complicated, and a regular rollback is the only
> option available. What
> >> >> >> >>> >> >> > >> I think this mean is that due to the Edit log
> spread among multiple
> >> >> >> >>> >> >> > >> nodes, a rollback that keeps data between upgrade
> and rollback is not
> >> >> >> >>> >> >> > >> available, so worst case scenario the data
> modified during that
> >> >> >> >>> >> >> > >> timeframe is lost. Not a big deal in my case, but
> I want to triple
> >> >> >> >>> >> >> > >> check with you if this is the correct
> interpretation or if there is
> >> >> >> >>> >> >> > >> another tutorial/guide/etc.. that I haven't read
> with a different
> >> >> >> >>> >> >> > >> procedure :)
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> Is my interpretation correct? If not, is there
> anybody with experience
> >> >> >> >>> >> >> > >> in HDFS upgrades that could shed some light on the
> subject?
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> Thanks in advance!
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> Luca
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >>
> >> >> >> >>> >> >> > >> [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
> >> >> >> >>> >> >> > >> [2]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> >> >> >> >>> >> >> > >> [3]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled
>

Re: Testing rollback after HDFS upgrade

Posted by Luca Toscano <to...@gmail.com>.
Hi everybody,

I am currently struggling with a precautionary step before moving to
Bigtop, namely finding enough space on a temporary backup cluster
(separated from the one to upgrade) to save important data that my
team wouldn't be able to recover in case of HDFS failure (historical
data etc..). I have tested the upgrade and rollback several times
(from CDH to Bigtop), but the nightmare scenarios that I have in mind
are:

- Upgrade goes fine, but the hdfs finalize step fails and leaves HDFS
inconsistent for some reason (datanodes' previous directory not
present anymore, etc..)
- Upgrade fails, rollback fails as well, and restoring a previous
namenode fsimage is not enough (inconsistent datanode state across the
cluster etc..)

Having the absolute important data set aside on a separate cluster
(not involved in the upgrade) seems the right and more conservative
choice, but it is of course challenging when the data to backup spans
hundreds of terabytes. Are the above right concerns? Or are those too
much considering the risks of the upgrade? I've researched this a bit
and didn't find anybody backing up data (except for the Namenode's
metadata like fsimages etc..) before an HDFS upgrade.

If anybody went through the same doubts and could give me some
feedback it would be really appreciated :)

Thanks in advance,

Luca



On Mon, Sep 28, 2020 at 8:48 AM Evans Ye <ev...@apache.org> wrote:
>
> Oh ok. That sounds great!
>
> Luca Toscano <to...@gmail.com> 於 2020年9月28日 週一 14:31 寫道:
>>
>> Hi Evans,
>>
>> what I meant with a blog post shared would be something that goes in
>> http://techblog.wikimedia.org/ and on
>> https://blogs.apache.org/bigtop/, stating that we collaborated and how
>> :)
>>
>> Luca
>>
>> On Mon, Sep 21, 2020 at 5:44 PM Evans Ye <ev...@apache.org> wrote:
>> >
>> > Yes. Overall it sounds great to me!
>> >
>> > I think the  "summary of known pitfalls/bugs/etc.." section is worth to add and might be a super valuable part of the whole thing.
>> >
>> > | "The Blog post would be a good idea, maybe something that we can share between Wikimedia and Apache"
>> > What do you mean by this one, specifically? Currently 3 things we can in below. Do they match what you think or it's something else?
>> >
>> > 1. Bigtop wiki/blogs:
>> > https://cwiki.apache.org/confluence/display/BIGTOP/Index
>> > https://blogs.apache.org/bigtop/
>> >
>> > 2. Success At Apache:
>> > https://blogs.apache.org/foundation/category/SuccessAtApache
>> >
>> > 3. ApacheCon Talk (this year CFP is over, we can do it next year as a post production expereince sharing)
>> > https://apachecon.com/index.html
>> >
>> > - Evans
>> >
>> >
>> > Luca Toscano <to...@gmail.com> 於 2020年9月20日 週日 下午4:55寫道:
>> >>
>> >> Hi Evans,
>> >>
>> >> I am late in answering as well :)
>> >>
>> >> I thought about it and I think that with the right premises (example:
>> >> this is tailored for Wikimedia's environment, it assumes that a
>> >> cluster downtime is acceptable, etc..) the storytelling style might be
>> >> more easy to digest than a list of steps to follow. I think that in
>> >> all use cases different from Wikimedia there will be adjustments to
>> >> make, and things that work/don't-work/etc.. One thing that it might be
>> >> good to add at the end is a "summary of known pitfalls/bugs/etc.."
>> >> found during the procedure, that in my case were the most
>> >> time-consuming ones. I'll add it during the next few days and people
>> >> can comment :)
>> >>
>> >> The Blog post would be a good idea, maybe something that we can share
>> >> between Wikimedia and Apache? I am planning to move to BigTop during
>> >> the upcoming quarter (October -> December), that will also show if my
>> >> procedure works on a cluster of 60+ nodes (rather than on a small one
>> >> of 8 nodes) :D. As soon as I have done it I'll follow up with this
>> >> list so organize a blog post, does it sound ok?
>> >>
>> >> Thanks a lot for all the support!
>> >>
>> >> Luca
>> >>
>> >> On Tue, Sep 15, 2020 at 6:06 PM Evans Ye <ev...@apache.org> wrote:
>> >> >
>> >> > Hey Luca,
>> >> >
>> >> > Sorry for the late reply. I was busy for a conference. It's just over now.
>> >> > Anyway, I  think the writing is pretty informative. But it's more like a storytelling style. Also several contents are WikiMedia specific things. That's why I think it's more suitable for a blogpost.
>> >> >
>> >> > Anyhow, I think either way it's great content. If we keep it as is, I think we can make it available on Bigtop's WIKI & Blog, or even Success at Apache with the title like "WikiMedia's story to migrate from CDH to Bigtop". If you want to make it more like an official guide, the title will be "CDH to Bigtop Migration Guide". We can state the limitation  and environment so that people can take it w/ a caution that it might not suit their own environment.
>> >> >
>> >> > Which way to go depends on how much effort you'd like to take. Let me know what you think so that we can move forward.
>> >> >
>> >> > - Evans
>> >> >
>> >> > Luca Toscano <to...@gmail.com> 於 2020年9月7日 週一 下午3:39寫道:
>> >> >>
>> >> >> Hi Evans,
>> >> >>
>> >> >> thanks for the review! What are the things that you'd like to see to
>> >> >> make them more consumable for users? I can re-shape the writing, I
>> >> >> tried to come up with something to kick off a conversation with the
>> >> >> community, it would be interesting to know if anybody else has a
>> >> >> similar use case and how/if they are working on a solution.
>> >> >>
>> >> >> For the blogpost, maybe we can coordinate something shared between
>> >> >> Apache and Wikimedia when the migration is done, I am sure it would be
>> >> >> a nice example of the two Foundations collaborating :)
>> >> >>
>> >> >> Luca
>> >> >>
>> >> >> On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <ev...@apache.org> wrote:
>> >> >> >
>> >> >> > Hi Luca,
>> >> >> >
>> >> >> > I read through the doc briefly. I think the doc works very well as a blogpost of a successful story for Wikimedia migrating from CDH to Bigtop. However, the current writing doesn't seem to be easily consumable for users' who are just looking into the solutions/steps for doing similar migrations. May I know what title you would prefer if we put the doc in Bigtop's wiki?
>> >> >> >
>> >> >> > What I was thinking is the cookbook for migration. But we can discuss this. IMHO a Success at Apache[1] blogpost is also possible. But I need to figure out who to talk to. Let me know what you think.
>> >> >> >
>> >> >> > [1] https://blogs.apache.org/foundation/category/SuccessAtApache
>> >> >> >
>> >> >> > Evans
>> >> >> >
>> >> >> > Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:
>> >> >> >>
>> >> >> >> Hi Luca,
>> >> >> >>
>> >> >> >> I'm on vacation hence do not have time for review right now. I'll get back to you next week.
>> >> >> >>
>> >> >> >> The doc is definitely valuable. Once you have your production migrated successfully. We can prove to the other users that this is a battle proven solution. Even more, we can give a talk at ApacheCon or somewhere else to further amplify the impact of the work. This is definitely an open source winning case so I think it deserve a talk.
>> >> >> >>
>> >> >> >> Evans
>> >> >> >>
>> >> >> >>
>> >> >> >> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:
>> >> >> >>>
>> >> >> >>> Hi Evans,
>> >> >> >>>
>> >> >> >>> it took a while I know but I have the first version of the gdoc for the upgrade:
>> >> >> >>>
>> >> >> >>> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
>> >> >> >>>
>> >> >> >>> I tried to list all the steps involved in migrating from CDH 5 to
>> >> >> >>> Bigtop 1.4, anybody interested should be able to comment. The idea
>> >> >> >>> that I have is to discuss this for a few days and then possibly make
>> >> >> >>> it permanent somewhere in the Bigtop wiki? (of course if the document
>> >> >> >>> will be considered useful for others etc..)
>> >> >> >>>
>> >> >> >>> During these days I tested the procedure multiple times, and I have
>> >> >> >>> also tested the HDFS finalize step, everything works as expected. I
>> >> >> >>> hope to be able to move to Bigtop during the next couple of months.
>> >> >> >>>
>> >> >> >>> Luca
>> >> >> >>>
>> >> >> >>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org> wrote:
>> >> >> >>> >
>> >> >> >>> > Yes. I think a shared gdoc is prefered, and you can open up a JIRA ticket to track it.
>> >> >> >>> >
>> >> >> >>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
>> >> >> >>> >>
>> >> >> >>> >> Hi Evans!
>> >> >> >>> >>
>> >> >> >>> >> What is the best medium to use for the documentation/comments ? A
>> >> >> >>> >> shared gdoc or something similar?
>> >> >> >>> >>
>> >> >> >>> >> Luca
>> >> >> >>> >>
>> >> >> >>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
>> >> >> >>> >> >
>> >> >> >>> >> > One thing I think would be great to have is a doc version of the steps for upgrade and rollback. The benefits:
>> >> >> >>> >> > 1. Anything unexpected happened during automation, you do have folks can quickly understand what's going on and get into the investigation.
>> >> >> >>> >> > 2. Share the doc with us to help the others OSS users for doing the migration. For the env specific things I think that's fine. We can left comment on it. At least all the other users can get a high level view of a proven solution. And then they can go and find out the rest of the pieces by themselves.
>> >> >> >>> >> >
>> >> >> >>> >> > For automations, I suggest to split up the automation into several stages, and apply some validation steps(manually is ok) before kicking of the next stage.
>> >> >> >>> >> >
>> >> >> >>> >> > Best,
>> >> >> >>> >> > Evans
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> >
>> >> >> >>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
>> >> >> >>> >> >>
>> >> >> >>> >> >> Hi everybody,
>> >> >> >>> >> >>
>> >> >> >>> >> >> I didn't get the time to work on this until recently, but I finally
>> >> >> >>> >> >> managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
>> >> >> >>> >> >> and rollback if needed. The assumptions are:
>> >> >> >>> >> >>
>> >> >> >>> >> >> 1) It is ok to have (limited) cluster downtime.
>> >> >> >>> >> >> 2) Rolling upgrade is not needed.
>> >> >> >>> >> >> 3) QJM is used.
>> >> >> >>> >> >>
>> >> >> >>> >> >> The procedure is listed in these two scripts:
>> >> >> >>> >> >>
>> >> >> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>> >> >> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>> >> >> >>> >> >>
>> >> >> >>> >> >> The code is highly dependent on my working environment, but it should
>> >> >> >>> >> >> be clear to follow when writing a tutorial about how to migrate from
>> >> >> >>> >> >> CDH to Bigtop. All the suggestions given by this mailing list were
>> >> >> >>> >> >> really useful to reach a solution!
>> >> >> >>> >> >>
>> >> >> >>> >> >> My next steps will be:
>> >> >> >>> >> >>
>> >> >> >>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
>> >> >> >>> >> >> jobs, test Hive 2, etc..).
>> >> >> >>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
>> >> >> >>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
>> >> >> >>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
>> >> >> >>> >> >> 4) Upgrade to Debian 10.
>> >> >> >>> >> >>
>> >> >> >>> >> >> With automation it shouldn't be very difficult, I'll report progress once made.
>> >> >> >>> >> >>
>> >> >> >>> >> >> Thanks a lot!
>> >> >> >>> >> >>
>> >> >> >>> >> >> Luca
>> >> >> >>> >> >>
>> >> >> >>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com> wrote:
>> >> >> >>> >> >> >
>> >> >> >>> >> >> > Hi Evans,
>> >> >> >>> >> >> >
>> >> >> >>> >> >> > thanks a lot for the feedback, it was exactly what I needed. The
>> >> >> >>> >> >> > simpler the better is definitely a good advice in this use case, I'll
>> >> >> >>> >> >> > try this week another rollout/rollback and report back :)
>> >> >> >>> >> >> >
>> >> >> >>> >> >> > Luca
>> >> >> >>> >> >> >
>> >> >> >>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > Hi Luca,
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > Thanks for reporting back and let us know how it goes.
>> >> >> >>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM HA, which was back in 2014.
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > Regarding to rollback, I think you're right:
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > My previous experience is the same that the rollback is merely a snapshot before the upgrade. If you've gone far, then rollback cost more data lost... Our runbook is if our sanity check failed during upgrade downtime, we perform the rollback immediately.
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > Regarding to that FSImage hole issue, I've experienced it as well.
>> >> >> >>> >> >> > > I managed to fix it by manually edit the FSImage with offline image viewer[1] and delete that missing editLog in FSImage. That actually brought my cluster back with a little number of missing blocks.
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > Our experience says that the more the steps, the more the chance you failed the upgrade. We did good on dozen times of testing, DEV cluster, STAGING cluster, but still got missing blocks when upgrading Production...
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > The suggestion is to get your production in good shape first(the less decommissioned, offline DNs, disk failures, the better).
>> >> >> >>> >> >> > > Also, maybe you can switch to non-HA mode and do the upgrade to simplify the things?
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > Not many helps but please let us know if any progress.
>> >> >> >>> >> >> > > Last one, have you reached out to Hadoop community? the authors should know the most :)
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > - Evans
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
>> >> >> >>> >> >> > >
>> >> >> >>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> Hi everybody,
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> most of the bugs/issues/etc.. that I found while upgrading from CDH 5
>> >> >> >>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
>> >> >> >>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
>> >> >> >>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add documentation
>> >> >> >>> >> >> > >> about this at the end I promise).
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> I initially followed [1][2] in my Test cluster, choosing the Rolling
>> >> >> >>> >> >> > >> upgrade, but when I tried to rollback (after days since the initial
>> >> >> >>> >> >> > >> upgrade) I ended up in an inconsistent state and I wasn't able to
>> >> >> >>> >> >> > >> recover the previous HDFS state. I didn't save the exact error
>> >> >> >>> >> >> > >> messages but the situation was more or less the following:
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> FS-Image-rollback (created at the time of the upgrade) - up to transaction X
>> >> >> >>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
>> >> >> >>> >> >> > >> totally made up for the example)
>> >> >> >>> >> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> When I tried to rolling rollback, the Namenode complained about a hole
>> >> >> >>> >> >> > >> in the transaction log, namely at X + 1, so it refused to start. I
>> >> >> >>> >> >> > >> tried to force a regular rollback, but the Namenode refused again
>> >> >> >>> >> >> > >> saying that there was no available FS Image to roll back to. I checked
>> >> >> >>> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs image with
>> >> >> >>> >> >> > >> different naming/path in case of a rolling upgrade or a regular
>> >> >> >>> >> >> > >> upgrade. Both cases make sense, especially the first one since there
>> >> >> >>> >> >> > >> was indeed a hole between the last transaction of the
>> >> >> >>> >> >> > >> FS-Image-rollback and the first available transaction to reply on the
>> >> >> >>> >> >> > >> QJM cluster. I chose the rolling upgrade initially since it was
>> >> >> >>> >> >> > >> appealing: it promises to bring back the Namenodes to their previous
>> >> >> >>> >> >> > >> versions, but keeping the data modified between upgrade and rollback.
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> I then found [3], in which it is said that with QJM everything is more
>> >> >> >>> >> >> > >> complicated, and a regular rollback is the only option available. What
>> >> >> >>> >> >> > >> I think this mean is that due to the Edit log spread among multiple
>> >> >> >>> >> >> > >> nodes, a rollback that keeps data between upgrade and rollback is not
>> >> >> >>> >> >> > >> available, so worst case scenario the data modified during that
>> >> >> >>> >> >> > >> timeframe is lost. Not a big deal in my case, but I want to triple
>> >> >> >>> >> >> > >> check with you if this is the correct interpretation or if there is
>> >> >> >>> >> >> > >> another tutorial/guide/etc.. that I haven't read with a different
>> >> >> >>> >> >> > >> procedure :)
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> Is my interpretation correct? If not, is there anybody with experience
>> >> >> >>> >> >> > >> in HDFS upgrades that could shed some light on the subject?
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> Thanks in advance!
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> Luca
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >>
>> >> >> >>> >> >> > >> [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
>> >> >> >>> >> >> > >> [2] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>> >> >> >>> >> >> > >> [3] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Re: Testing rollback after HDFS upgrade

Posted by Evans Ye <ev...@apache.org>.
Oh ok. That sounds great!

Luca Toscano <to...@gmail.com> 於 2020年9月28日 週一 14:31 寫道:

> Hi Evans,
>
> what I meant with a blog post shared would be something that goes in
> http://techblog.wikimedia.org/ and on
> https://blogs.apache.org/bigtop/, stating that we collaborated and how
> :)
>
> Luca
>
> On Mon, Sep 21, 2020 at 5:44 PM Evans Ye <ev...@apache.org> wrote:
> >
> > Yes. Overall it sounds great to me!
> >
> > I think the  "summary of known pitfalls/bugs/etc.." section is worth to
> add and might be a super valuable part of the whole thing.
> >
> > | "The Blog post would be a good idea, maybe something that we can share
> between Wikimedia and Apache"
> > What do you mean by this one, specifically? Currently 3 things we can in
> below. Do they match what you think or it's something else?
> >
> > 1. Bigtop wiki/blogs:
> > https://cwiki.apache.org/confluence/display/BIGTOP/Index
> > https://blogs.apache.org/bigtop/
> >
> > 2. Success At Apache:
> > https://blogs.apache.org/foundation/category/SuccessAtApache
> >
> > 3. ApacheCon Talk (this year CFP is over, we can do it next year as a
> post production expereince sharing)
> > https://apachecon.com/index.html
> >
> > - Evans
> >
> >
> > Luca Toscano <to...@gmail.com> 於 2020年9月20日 週日 下午4:55寫道:
> >>
> >> Hi Evans,
> >>
> >> I am late in answering as well :)
> >>
> >> I thought about it and I think that with the right premises (example:
> >> this is tailored for Wikimedia's environment, it assumes that a
> >> cluster downtime is acceptable, etc..) the storytelling style might be
> >> more easy to digest than a list of steps to follow. I think that in
> >> all use cases different from Wikimedia there will be adjustments to
> >> make, and things that work/don't-work/etc.. One thing that it might be
> >> good to add at the end is a "summary of known pitfalls/bugs/etc.."
> >> found during the procedure, that in my case were the most
> >> time-consuming ones. I'll add it during the next few days and people
> >> can comment :)
> >>
> >> The Blog post would be a good idea, maybe something that we can share
> >> between Wikimedia and Apache? I am planning to move to BigTop during
> >> the upcoming quarter (October -> December), that will also show if my
> >> procedure works on a cluster of 60+ nodes (rather than on a small one
> >> of 8 nodes) :D. As soon as I have done it I'll follow up with this
> >> list so organize a blog post, does it sound ok?
> >>
> >> Thanks a lot for all the support!
> >>
> >> Luca
> >>
> >> On Tue, Sep 15, 2020 at 6:06 PM Evans Ye <ev...@apache.org> wrote:
> >> >
> >> > Hey Luca,
> >> >
> >> > Sorry for the late reply. I was busy for a conference. It's just over
> now.
> >> > Anyway, I  think the writing is pretty informative. But it's more
> like a storytelling style. Also several contents are WikiMedia specific
> things. That's why I think it's more suitable for a blogpost.
> >> >
> >> > Anyhow, I think either way it's great content. If we keep it as is, I
> think we can make it available on Bigtop's WIKI & Blog, or even Success at
> Apache with the title like "WikiMedia's story to migrate from CDH to
> Bigtop". If you want to make it more like an official guide, the title will
> be "CDH to Bigtop Migration Guide". We can state the limitation  and
> environment so that people can take it w/ a caution that it might not suit
> their own environment.
> >> >
> >> > Which way to go depends on how much effort you'd like to take. Let me
> know what you think so that we can move forward.
> >> >
> >> > - Evans
> >> >
> >> > Luca Toscano <to...@gmail.com> 於 2020年9月7日 週一 下午3:39寫道:
> >> >>
> >> >> Hi Evans,
> >> >>
> >> >> thanks for the review! What are the things that you'd like to see to
> >> >> make them more consumable for users? I can re-shape the writing, I
> >> >> tried to come up with something to kick off a conversation with the
> >> >> community, it would be interesting to know if anybody else has a
> >> >> similar use case and how/if they are working on a solution.
> >> >>
> >> >> For the blogpost, maybe we can coordinate something shared between
> >> >> Apache and Wikimedia when the migration is done, I am sure it would
> be
> >> >> a nice example of the two Foundations collaborating :)
> >> >>
> >> >> Luca
> >> >>
> >> >> On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <ev...@apache.org> wrote:
> >> >> >
> >> >> > Hi Luca,
> >> >> >
> >> >> > I read through the doc briefly. I think the doc works very well as
> a blogpost of a successful story for Wikimedia migrating from CDH to
> Bigtop. However, the current writing doesn't seem to be easily consumable
> for users' who are just looking into the solutions/steps for doing similar
> migrations. May I know what title you would prefer if we put the doc in
> Bigtop's wiki?
> >> >> >
> >> >> > What I was thinking is the cookbook for migration. But we can
> discuss this. IMHO a Success at Apache[1] blogpost is also possible. But I
> need to figure out who to talk to. Let me know what you think.
> >> >> >
> >> >> > [1] https://blogs.apache.org/foundation/category/SuccessAtApache
> >> >> >
> >> >> > Evans
> >> >> >
> >> >> > Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:
> >> >> >>
> >> >> >> Hi Luca,
> >> >> >>
> >> >> >> I'm on vacation hence do not have time for review right now. I'll
> get back to you next week.
> >> >> >>
> >> >> >> The doc is definitely valuable. Once you have your production
> migrated successfully. We can prove to the other users that this is a
> battle proven solution. Even more, we can give a talk at ApacheCon or
> somewhere else to further amplify the impact of the work. This is
> definitely an open source winning case so I think it deserve a talk.
> >> >> >>
> >> >> >> Evans
> >> >> >>
> >> >> >>
> >> >> >> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:
> >> >> >>>
> >> >> >>> Hi Evans,
> >> >> >>>
> >> >> >>> it took a while I know but I have the first version of the gdoc
> for the upgrade:
> >> >> >>>
> >> >> >>>
> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
> >> >> >>>
> >> >> >>> I tried to list all the steps involved in migrating from CDH 5 to
> >> >> >>> Bigtop 1.4, anybody interested should be able to comment. The
> idea
> >> >> >>> that I have is to discuss this for a few days and then possibly
> make
> >> >> >>> it permanent somewhere in the Bigtop wiki? (of course if the
> document
> >> >> >>> will be considered useful for others etc..)
> >> >> >>>
> >> >> >>> During these days I tested the procedure multiple times, and I
> have
> >> >> >>> also tested the HDFS finalize step, everything works as
> expected. I
> >> >> >>> hope to be able to move to Bigtop during the next couple of
> months.
> >> >> >>>
> >> >> >>> Luca
> >> >> >>>
> >> >> >>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org>
> wrote:
> >> >> >>> >
> >> >> >>> > Yes. I think a shared gdoc is prefered, and you can open up a
> JIRA ticket to track it.
> >> >> >>> >
> >> >> >>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10
> 寫道:
> >> >> >>> >>
> >> >> >>> >> Hi Evans!
> >> >> >>> >>
> >> >> >>> >> What is the best medium to use for the documentation/comments
> ? A
> >> >> >>> >> shared gdoc or something similar?
> >> >> >>> >>
> >> >> >>> >> Luca
> >> >> >>> >>
> >> >> >>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org>
> wrote:
> >> >> >>> >> >
> >> >> >>> >> > One thing I think would be great to have is a doc version
> of the steps for upgrade and rollback. The benefits:
> >> >> >>> >> > 1. Anything unexpected happened during automation, you do
> have folks can quickly understand what's going on and get into the
> investigation.
> >> >> >>> >> > 2. Share the doc with us to help the others OSS users for
> doing the migration. For the env specific things I think that's fine. We
> can left comment on it. At least all the other users can get a high level
> view of a proven solution. And then they can go and find out the rest of
> the pieces by themselves.
> >> >> >>> >> >
> >> >> >>> >> > For automations, I suggest to split up the automation into
> several stages, and apply some validation steps(manually is ok) before
> kicking of the next stage.
> >> >> >>> >> >
> >> >> >>> >> > Best,
> >> >> >>> >> > Evans
> >> >> >>> >> >
> >> >> >>> >> >
> >> >> >>> >> >
> >> >> >>> >> >
> >> >> >>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三
> 下午9:07寫道:
> >> >> >>> >> >>
> >> >> >>> >> >> Hi everybody,
> >> >> >>> >> >>
> >> >> >>> >> >> I didn't get the time to work on this until recently, but
> I finally
> >> >> >>> >> >> managed to have a reliable procedure to upgrade from CDH
> to Bigtop 1.4
> >> >> >>> >> >> and rollback if needed. The assumptions are:
> >> >> >>> >> >>
> >> >> >>> >> >> 1) It is ok to have (limited) cluster downtime.
> >> >> >>> >> >> 2) Rolling upgrade is not needed.
> >> >> >>> >> >> 3) QJM is used.
> >> >> >>> >> >>
> >> >> >>> >> >> The procedure is listed in these two scripts:
> >> >> >>> >> >>
> >> >> >>> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
> >> >> >>> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
> >> >> >>> >> >>
> >> >> >>> >> >> The code is highly dependent on my working environment,
> but it should
> >> >> >>> >> >> be clear to follow when writing a tutorial about how to
> migrate from
> >> >> >>> >> >> CDH to Bigtop. All the suggestions given by this mailing
> list were
> >> >> >>> >> >> really useful to reach a solution!
> >> >> >>> >> >>
> >> >> >>> >> >> My next steps will be:
> >> >> >>> >> >>
> >> >> >>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run
> more hadoop
> >> >> >>> >> >> jobs, test Hive 2, etc..).
> >> >> >>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on
> Debian 9
> >> >> >>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
> >> >> >>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
> >> >> >>> >> >> 4) Upgrade to Debian 10.
> >> >> >>> >> >>
> >> >> >>> >> >> With automation it shouldn't be very difficult, I'll
> report progress once made.
> >> >> >>> >> >>
> >> >> >>> >> >> Thanks a lot!
> >> >> >>> >> >>
> >> >> >>> >> >> Luca
> >> >> >>> >> >>
> >> >> >>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <
> toscano.luca@gmail.com> wrote:
> >> >> >>> >> >> >
> >> >> >>> >> >> > Hi Evans,
> >> >> >>> >> >> >
> >> >> >>> >> >> > thanks a lot for the feedback, it was exactly what I
> needed. The
> >> >> >>> >> >> > simpler the better is definitely a good advice in this
> use case, I'll
> >> >> >>> >> >> > try this week another rollout/rollback and report back :)
> >> >> >>> >> >> >
> >> >> >>> >> >> > Luca
> >> >> >>> >> >> >
> >> >> >>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <
> evansye@apache.org> wrote:
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > Hi Luca,
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > Thanks for reporting back and let us know how it goes.
> >> >> >>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade
> experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and
> then enable QJM HA, which was back in 2014.
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > Regarding to rollback, I think you're right:
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > it is possible to rollback to HDFS’ state before the
> upgrade in case of unexpected problems.
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > My previous experience is the same that the rollback
> is merely a snapshot before the upgrade. If you've gone far, then rollback
> cost more data lost... Our runbook is if our sanity check failed during
> upgrade downtime, we perform the rollback immediately.
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > Regarding to that FSImage hole issue, I've experienced
> it as well.
> >> >> >>> >> >> > > I managed to fix it by manually edit the FSImage with
> offline image viewer[1] and delete that missing editLog in FSImage. That
> actually brought my cluster back with a little number of missing blocks.
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > Our experience says that the more the steps, the more
> the chance you failed the upgrade. We did good on dozen times of testing,
> DEV cluster, STAGING cluster, but still got missing blocks when upgrading
> Production...
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > The suggestion is to get your production in good shape
> first(the less decommissioned, offline DNs, disk failures, the better).
> >> >> >>> >> >> > > Also, maybe you can switch to non-HA mode and do the
> upgrade to simplify the things?
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > Not many helps but please let us know if any progress.
> >> >> >>> >> >> > > Last one, have you reached out to Hadoop community?
> the authors should know the most :)
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > - Evans
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
> >> >> >>> >> >> > >
> >> >> >>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三
> 21:03 寫道:
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> Hi everybody,
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> most of the bugs/issues/etc.. that I found while
> upgrading from CDH 5
> >> >> >>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as
> suggested also in here)
> >> >> >>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
> >> >> >>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add
> documentation
> >> >> >>> >> >> > >> about this at the end I promise).
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> I initially followed [1][2] in my Test cluster,
> choosing the Rolling
> >> >> >>> >> >> > >> upgrade, but when I tried to rollback (after days
> since the initial
> >> >> >>> >> >> > >> upgrade) I ended up in an inconsistent state and I
> wasn't able to
> >> >> >>> >> >> > >> recover the previous HDFS state. I didn't save the
> exact error
> >> >> >>> >> >> > >> messages but the situation was more or less the
> following:
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> FS-Image-rollback (created at the time of the
> upgrade) - up to transaction X
> >> >> >>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X +
> 10000 (number
> >> >> >>> >> >> > >> totally made up for the example)
> >> >> >>> >> >> > >> QJM cluster: first available transaction Z = X +
> 10000 + 1
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> When I tried to rolling rollback, the Namenode
> complained about a hole
> >> >> >>> >> >> > >> in the transaction log, namely at X + 1, so it
> refused to start. I
> >> >> >>> >> >> > >> tried to force a regular rollback, but the Namenode
> refused again
> >> >> >>> >> >> > >> saying that there was no available FS Image to roll
> back to. I checked
> >> >> >>> >> >> > >> in the Hadoop code and indeed the Namenode saves the
> fs image with
> >> >> >>> >> >> > >> different naming/path in case of a rolling upgrade or
> a regular
> >> >> >>> >> >> > >> upgrade. Both cases make sense, especially the first
> one since there
> >> >> >>> >> >> > >> was indeed a hole between the last transaction of the
> >> >> >>> >> >> > >> FS-Image-rollback and the first available transaction
> to reply on the
> >> >> >>> >> >> > >> QJM cluster. I chose the rolling upgrade initially
> since it was
> >> >> >>> >> >> > >> appealing: it promises to bring back the Namenodes to
> their previous
> >> >> >>> >> >> > >> versions, but keeping the data modified between
> upgrade and rollback.
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> I then found [3], in which it is said that with QJM
> everything is more
> >> >> >>> >> >> > >> complicated, and a regular rollback is the only
> option available. What
> >> >> >>> >> >> > >> I think this mean is that due to the Edit log spread
> among multiple
> >> >> >>> >> >> > >> nodes, a rollback that keeps data between upgrade and
> rollback is not
> >> >> >>> >> >> > >> available, so worst case scenario the data modified
> during that
> >> >> >>> >> >> > >> timeframe is lost. Not a big deal in my case, but I
> want to triple
> >> >> >>> >> >> > >> check with you if this is the correct interpretation
> or if there is
> >> >> >>> >> >> > >> another tutorial/guide/etc.. that I haven't read with
> a different
> >> >> >>> >> >> > >> procedure :)
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> Is my interpretation correct? If not, is there
> anybody with experience
> >> >> >>> >> >> > >> in HDFS upgrades that could shed some light on the
> subject?
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> Thanks in advance!
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> Luca
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >>
> >> >> >>> >> >> > >> [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
> >> >> >>> >> >> > >> [2]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> >> >> >>> >> >> > >> [3]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled
>

Re: Testing rollback after HDFS upgrade

Posted by Luca Toscano <to...@gmail.com>.
Hi Evans,

what I meant with a blog post shared would be something that goes in
http://techblog.wikimedia.org/ and on
https://blogs.apache.org/bigtop/, stating that we collaborated and how
:)

Luca

On Mon, Sep 21, 2020 at 5:44 PM Evans Ye <ev...@apache.org> wrote:
>
> Yes. Overall it sounds great to me!
>
> I think the  "summary of known pitfalls/bugs/etc.." section is worth to add and might be a super valuable part of the whole thing.
>
> | "The Blog post would be a good idea, maybe something that we can share between Wikimedia and Apache"
> What do you mean by this one, specifically? Currently 3 things we can in below. Do they match what you think or it's something else?
>
> 1. Bigtop wiki/blogs:
> https://cwiki.apache.org/confluence/display/BIGTOP/Index
> https://blogs.apache.org/bigtop/
>
> 2. Success At Apache:
> https://blogs.apache.org/foundation/category/SuccessAtApache
>
> 3. ApacheCon Talk (this year CFP is over, we can do it next year as a post production expereince sharing)
> https://apachecon.com/index.html
>
> - Evans
>
>
> Luca Toscano <to...@gmail.com> 於 2020年9月20日 週日 下午4:55寫道:
>>
>> Hi Evans,
>>
>> I am late in answering as well :)
>>
>> I thought about it and I think that with the right premises (example:
>> this is tailored for Wikimedia's environment, it assumes that a
>> cluster downtime is acceptable, etc..) the storytelling style might be
>> more easy to digest than a list of steps to follow. I think that in
>> all use cases different from Wikimedia there will be adjustments to
>> make, and things that work/don't-work/etc.. One thing that it might be
>> good to add at the end is a "summary of known pitfalls/bugs/etc.."
>> found during the procedure, that in my case were the most
>> time-consuming ones. I'll add it during the next few days and people
>> can comment :)
>>
>> The Blog post would be a good idea, maybe something that we can share
>> between Wikimedia and Apache? I am planning to move to BigTop during
>> the upcoming quarter (October -> December), that will also show if my
>> procedure works on a cluster of 60+ nodes (rather than on a small one
>> of 8 nodes) :D. As soon as I have done it I'll follow up with this
>> list so organize a blog post, does it sound ok?
>>
>> Thanks a lot for all the support!
>>
>> Luca
>>
>> On Tue, Sep 15, 2020 at 6:06 PM Evans Ye <ev...@apache.org> wrote:
>> >
>> > Hey Luca,
>> >
>> > Sorry for the late reply. I was busy for a conference. It's just over now.
>> > Anyway, I  think the writing is pretty informative. But it's more like a storytelling style. Also several contents are WikiMedia specific things. That's why I think it's more suitable for a blogpost.
>> >
>> > Anyhow, I think either way it's great content. If we keep it as is, I think we can make it available on Bigtop's WIKI & Blog, or even Success at Apache with the title like "WikiMedia's story to migrate from CDH to Bigtop". If you want to make it more like an official guide, the title will be "CDH to Bigtop Migration Guide". We can state the limitation  and environment so that people can take it w/ a caution that it might not suit their own environment.
>> >
>> > Which way to go depends on how much effort you'd like to take. Let me know what you think so that we can move forward.
>> >
>> > - Evans
>> >
>> > Luca Toscano <to...@gmail.com> 於 2020年9月7日 週一 下午3:39寫道:
>> >>
>> >> Hi Evans,
>> >>
>> >> thanks for the review! What are the things that you'd like to see to
>> >> make them more consumable for users? I can re-shape the writing, I
>> >> tried to come up with something to kick off a conversation with the
>> >> community, it would be interesting to know if anybody else has a
>> >> similar use case and how/if they are working on a solution.
>> >>
>> >> For the blogpost, maybe we can coordinate something shared between
>> >> Apache and Wikimedia when the migration is done, I am sure it would be
>> >> a nice example of the two Foundations collaborating :)
>> >>
>> >> Luca
>> >>
>> >> On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <ev...@apache.org> wrote:
>> >> >
>> >> > Hi Luca,
>> >> >
>> >> > I read through the doc briefly. I think the doc works very well as a blogpost of a successful story for Wikimedia migrating from CDH to Bigtop. However, the current writing doesn't seem to be easily consumable for users' who are just looking into the solutions/steps for doing similar migrations. May I know what title you would prefer if we put the doc in Bigtop's wiki?
>> >> >
>> >> > What I was thinking is the cookbook for migration. But we can discuss this. IMHO a Success at Apache[1] blogpost is also possible. But I need to figure out who to talk to. Let me know what you think.
>> >> >
>> >> > [1] https://blogs.apache.org/foundation/category/SuccessAtApache
>> >> >
>> >> > Evans
>> >> >
>> >> > Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:
>> >> >>
>> >> >> Hi Luca,
>> >> >>
>> >> >> I'm on vacation hence do not have time for review right now. I'll get back to you next week.
>> >> >>
>> >> >> The doc is definitely valuable. Once you have your production migrated successfully. We can prove to the other users that this is a battle proven solution. Even more, we can give a talk at ApacheCon or somewhere else to further amplify the impact of the work. This is definitely an open source winning case so I think it deserve a talk.
>> >> >>
>> >> >> Evans
>> >> >>
>> >> >>
>> >> >> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:
>> >> >>>
>> >> >>> Hi Evans,
>> >> >>>
>> >> >>> it took a while I know but I have the first version of the gdoc for the upgrade:
>> >> >>>
>> >> >>> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
>> >> >>>
>> >> >>> I tried to list all the steps involved in migrating from CDH 5 to
>> >> >>> Bigtop 1.4, anybody interested should be able to comment. The idea
>> >> >>> that I have is to discuss this for a few days and then possibly make
>> >> >>> it permanent somewhere in the Bigtop wiki? (of course if the document
>> >> >>> will be considered useful for others etc..)
>> >> >>>
>> >> >>> During these days I tested the procedure multiple times, and I have
>> >> >>> also tested the HDFS finalize step, everything works as expected. I
>> >> >>> hope to be able to move to Bigtop during the next couple of months.
>> >> >>>
>> >> >>> Luca
>> >> >>>
>> >> >>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org> wrote:
>> >> >>> >
>> >> >>> > Yes. I think a shared gdoc is prefered, and you can open up a JIRA ticket to track it.
>> >> >>> >
>> >> >>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
>> >> >>> >>
>> >> >>> >> Hi Evans!
>> >> >>> >>
>> >> >>> >> What is the best medium to use for the documentation/comments ? A
>> >> >>> >> shared gdoc or something similar?
>> >> >>> >>
>> >> >>> >> Luca
>> >> >>> >>
>> >> >>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
>> >> >>> >> >
>> >> >>> >> > One thing I think would be great to have is a doc version of the steps for upgrade and rollback. The benefits:
>> >> >>> >> > 1. Anything unexpected happened during automation, you do have folks can quickly understand what's going on and get into the investigation.
>> >> >>> >> > 2. Share the doc with us to help the others OSS users for doing the migration. For the env specific things I think that's fine. We can left comment on it. At least all the other users can get a high level view of a proven solution. And then they can go and find out the rest of the pieces by themselves.
>> >> >>> >> >
>> >> >>> >> > For automations, I suggest to split up the automation into several stages, and apply some validation steps(manually is ok) before kicking of the next stage.
>> >> >>> >> >
>> >> >>> >> > Best,
>> >> >>> >> > Evans
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
>> >> >>> >> >>
>> >> >>> >> >> Hi everybody,
>> >> >>> >> >>
>> >> >>> >> >> I didn't get the time to work on this until recently, but I finally
>> >> >>> >> >> managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
>> >> >>> >> >> and rollback if needed. The assumptions are:
>> >> >>> >> >>
>> >> >>> >> >> 1) It is ok to have (limited) cluster downtime.
>> >> >>> >> >> 2) Rolling upgrade is not needed.
>> >> >>> >> >> 3) QJM is used.
>> >> >>> >> >>
>> >> >>> >> >> The procedure is listed in these two scripts:
>> >> >>> >> >>
>> >> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>> >> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>> >> >>> >> >>
>> >> >>> >> >> The code is highly dependent on my working environment, but it should
>> >> >>> >> >> be clear to follow when writing a tutorial about how to migrate from
>> >> >>> >> >> CDH to Bigtop. All the suggestions given by this mailing list were
>> >> >>> >> >> really useful to reach a solution!
>> >> >>> >> >>
>> >> >>> >> >> My next steps will be:
>> >> >>> >> >>
>> >> >>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
>> >> >>> >> >> jobs, test Hive 2, etc..).
>> >> >>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
>> >> >>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
>> >> >>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
>> >> >>> >> >> 4) Upgrade to Debian 10.
>> >> >>> >> >>
>> >> >>> >> >> With automation it shouldn't be very difficult, I'll report progress once made.
>> >> >>> >> >>
>> >> >>> >> >> Thanks a lot!
>> >> >>> >> >>
>> >> >>> >> >> Luca
>> >> >>> >> >>
>> >> >>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com> wrote:
>> >> >>> >> >> >
>> >> >>> >> >> > Hi Evans,
>> >> >>> >> >> >
>> >> >>> >> >> > thanks a lot for the feedback, it was exactly what I needed. The
>> >> >>> >> >> > simpler the better is definitely a good advice in this use case, I'll
>> >> >>> >> >> > try this week another rollout/rollback and report back :)
>> >> >>> >> >> >
>> >> >>> >> >> > Luca
>> >> >>> >> >> >
>> >> >>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
>> >> >>> >> >> > >
>> >> >>> >> >> > > Hi Luca,
>> >> >>> >> >> > >
>> >> >>> >> >> > > Thanks for reporting back and let us know how it goes.
>> >> >>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM HA, which was back in 2014.
>> >> >>> >> >> > >
>> >> >>> >> >> > > Regarding to rollback, I think you're right:
>> >> >>> >> >> > >
>> >> >>> >> >> > > it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
>> >> >>> >> >> > >
>> >> >>> >> >> > > My previous experience is the same that the rollback is merely a snapshot before the upgrade. If you've gone far, then rollback cost more data lost... Our runbook is if our sanity check failed during upgrade downtime, we perform the rollback immediately.
>> >> >>> >> >> > >
>> >> >>> >> >> > > Regarding to that FSImage hole issue, I've experienced it as well.
>> >> >>> >> >> > > I managed to fix it by manually edit the FSImage with offline image viewer[1] and delete that missing editLog in FSImage. That actually brought my cluster back with a little number of missing blocks.
>> >> >>> >> >> > >
>> >> >>> >> >> > > Our experience says that the more the steps, the more the chance you failed the upgrade. We did good on dozen times of testing, DEV cluster, STAGING cluster, but still got missing blocks when upgrading Production...
>> >> >>> >> >> > >
>> >> >>> >> >> > > The suggestion is to get your production in good shape first(the less decommissioned, offline DNs, disk failures, the better).
>> >> >>> >> >> > > Also, maybe you can switch to non-HA mode and do the upgrade to simplify the things?
>> >> >>> >> >> > >
>> >> >>> >> >> > > Not many helps but please let us know if any progress.
>> >> >>> >> >> > > Last one, have you reached out to Hadoop community? the authors should know the most :)
>> >> >>> >> >> > >
>> >> >>> >> >> > > - Evans
>> >> >>> >> >> > >
>> >> >>> >> >> > > [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
>> >> >>> >> >> > >
>> >> >>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> Hi everybody,
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> most of the bugs/issues/etc.. that I found while upgrading from CDH 5
>> >> >>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
>> >> >>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
>> >> >>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add documentation
>> >> >>> >> >> > >> about this at the end I promise).
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> I initially followed [1][2] in my Test cluster, choosing the Rolling
>> >> >>> >> >> > >> upgrade, but when I tried to rollback (after days since the initial
>> >> >>> >> >> > >> upgrade) I ended up in an inconsistent state and I wasn't able to
>> >> >>> >> >> > >> recover the previous HDFS state. I didn't save the exact error
>> >> >>> >> >> > >> messages but the situation was more or less the following:
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> FS-Image-rollback (created at the time of the upgrade) - up to transaction X
>> >> >>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
>> >> >>> >> >> > >> totally made up for the example)
>> >> >>> >> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> When I tried to rolling rollback, the Namenode complained about a hole
>> >> >>> >> >> > >> in the transaction log, namely at X + 1, so it refused to start. I
>> >> >>> >> >> > >> tried to force a regular rollback, but the Namenode refused again
>> >> >>> >> >> > >> saying that there was no available FS Image to roll back to. I checked
>> >> >>> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs image with
>> >> >>> >> >> > >> different naming/path in case of a rolling upgrade or a regular
>> >> >>> >> >> > >> upgrade. Both cases make sense, especially the first one since there
>> >> >>> >> >> > >> was indeed a hole between the last transaction of the
>> >> >>> >> >> > >> FS-Image-rollback and the first available transaction to reply on the
>> >> >>> >> >> > >> QJM cluster. I chose the rolling upgrade initially since it was
>> >> >>> >> >> > >> appealing: it promises to bring back the Namenodes to their previous
>> >> >>> >> >> > >> versions, but keeping the data modified between upgrade and rollback.
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> I then found [3], in which it is said that with QJM everything is more
>> >> >>> >> >> > >> complicated, and a regular rollback is the only option available. What
>> >> >>> >> >> > >> I think this mean is that due to the Edit log spread among multiple
>> >> >>> >> >> > >> nodes, a rollback that keeps data between upgrade and rollback is not
>> >> >>> >> >> > >> available, so worst case scenario the data modified during that
>> >> >>> >> >> > >> timeframe is lost. Not a big deal in my case, but I want to triple
>> >> >>> >> >> > >> check with you if this is the correct interpretation or if there is
>> >> >>> >> >> > >> another tutorial/guide/etc.. that I haven't read with a different
>> >> >>> >> >> > >> procedure :)
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> Is my interpretation correct? If not, is there anybody with experience
>> >> >>> >> >> > >> in HDFS upgrades that could shed some light on the subject?
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> Thanks in advance!
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> Luca
>> >> >>> >> >> > >>
>> >> >>> >> >> > >>
>> >> >>> >> >> > >>
>> >> >>> >> >> > >> [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
>> >> >>> >> >> > >> [2] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>> >> >>> >> >> > >> [3] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Re: Testing rollback after HDFS upgrade

Posted by Evans Ye <ev...@apache.org>.
Yes. Overall it sounds great to me!

I think the  "summary of known pitfalls/bugs/etc.." section is worth to add
and might be a super valuable part of the whole thing.

| "The Blog post would be a good idea, maybe something that we can share
between Wikimedia and Apache"
What do you mean by this one, specifically? Currently 3 things we can in
below. Do they match what you think or it's something else?

1. Bigtop wiki/blogs:
https://cwiki.apache.org/confluence/display/BIGTOP/Index
https://blogs.apache.org/bigtop/

2. Success At Apache:
https://blogs.apache.org/foundation/category/SuccessAtApache

3. ApacheCon Talk (this year CFP is over, we can do it next year as a post
production expereince sharing)
https://apachecon.com/index.html

- Evans


Luca Toscano <to...@gmail.com> 於 2020年9月20日 週日 下午4:55寫道:

> Hi Evans,
>
> I am late in answering as well :)
>
> I thought about it and I think that with the right premises (example:
> this is tailored for Wikimedia's environment, it assumes that a
> cluster downtime is acceptable, etc..) the storytelling style might be
> more easy to digest than a list of steps to follow. I think that in
> all use cases different from Wikimedia there will be adjustments to
> make, and things that work/don't-work/etc.. One thing that it might be
> good to add at the end is a "summary of known pitfalls/bugs/etc.."
> found during the procedure, that in my case were the most
> time-consuming ones. I'll add it during the next few days and people
> can comment :)
>
> The Blog post would be a good idea, maybe something that we can share
> between Wikimedia and Apache? I am planning to move to BigTop during
> the upcoming quarter (October -> December), that will also show if my
> procedure works on a cluster of 60+ nodes (rather than on a small one
> of 8 nodes) :D. As soon as I have done it I'll follow up with this
> list so organize a blog post, does it sound ok?
>
> Thanks a lot for all the support!
>
> Luca
>
> On Tue, Sep 15, 2020 at 6:06 PM Evans Ye <ev...@apache.org> wrote:
> >
> > Hey Luca,
> >
> > Sorry for the late reply. I was busy for a conference. It's just over
> now.
> > Anyway, I  think the writing is pretty informative. But it's more like a
> storytelling style. Also several contents are WikiMedia specific things.
> That's why I think it's more suitable for a blogpost.
> >
> > Anyhow, I think either way it's great content. If we keep it as is, I
> think we can make it available on Bigtop's WIKI & Blog, or even Success at
> Apache with the title like "WikiMedia's story to migrate from CDH to
> Bigtop". If you want to make it more like an official guide, the title will
> be "CDH to Bigtop Migration Guide". We can state the limitation  and
> environment so that people can take it w/ a caution that it might not suit
> their own environment.
> >
> > Which way to go depends on how much effort you'd like to take. Let me
> know what you think so that we can move forward.
> >
> > - Evans
> >
> > Luca Toscano <to...@gmail.com> 於 2020年9月7日 週一 下午3:39寫道:
> >>
> >> Hi Evans,
> >>
> >> thanks for the review! What are the things that you'd like to see to
> >> make them more consumable for users? I can re-shape the writing, I
> >> tried to come up with something to kick off a conversation with the
> >> community, it would be interesting to know if anybody else has a
> >> similar use case and how/if they are working on a solution.
> >>
> >> For the blogpost, maybe we can coordinate something shared between
> >> Apache and Wikimedia when the migration is done, I am sure it would be
> >> a nice example of the two Foundations collaborating :)
> >>
> >> Luca
> >>
> >> On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <ev...@apache.org> wrote:
> >> >
> >> > Hi Luca,
> >> >
> >> > I read through the doc briefly. I think the doc works very well as a
> blogpost of a successful story for Wikimedia migrating from CDH to Bigtop.
> However, the current writing doesn't seem to be easily consumable for
> users' who are just looking into the solutions/steps for doing similar
> migrations. May I know what title you would prefer if we put the doc in
> Bigtop's wiki?
> >> >
> >> > What I was thinking is the cookbook for migration. But we can discuss
> this. IMHO a Success at Apache[1] blogpost is also possible. But I need to
> figure out who to talk to. Let me know what you think.
> >> >
> >> > [1] https://blogs.apache.org/foundation/category/SuccessAtApache
> >> >
> >> > Evans
> >> >
> >> > Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:
> >> >>
> >> >> Hi Luca,
> >> >>
> >> >> I'm on vacation hence do not have time for review right now. I'll
> get back to you next week.
> >> >>
> >> >> The doc is definitely valuable. Once you have your production
> migrated successfully. We can prove to the other users that this is a
> battle proven solution. Even more, we can give a talk at ApacheCon or
> somewhere else to further amplify the impact of the work. This is
> definitely an open source winning case so I think it deserve a talk.
> >> >>
> >> >> Evans
> >> >>
> >> >>
> >> >> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:
> >> >>>
> >> >>> Hi Evans,
> >> >>>
> >> >>> it took a while I know but I have the first version of the gdoc for
> the upgrade:
> >> >>>
> >> >>>
> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
> >> >>>
> >> >>> I tried to list all the steps involved in migrating from CDH 5 to
> >> >>> Bigtop 1.4, anybody interested should be able to comment. The idea
> >> >>> that I have is to discuss this for a few days and then possibly make
> >> >>> it permanent somewhere in the Bigtop wiki? (of course if the
> document
> >> >>> will be considered useful for others etc..)
> >> >>>
> >> >>> During these days I tested the procedure multiple times, and I have
> >> >>> also tested the HDFS finalize step, everything works as expected. I
> >> >>> hope to be able to move to Bigtop during the next couple of months.
> >> >>>
> >> >>> Luca
> >> >>>
> >> >>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org>
> wrote:
> >> >>> >
> >> >>> > Yes. I think a shared gdoc is prefered, and you can open up a
> JIRA ticket to track it.
> >> >>> >
> >> >>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
> >> >>> >>
> >> >>> >> Hi Evans!
> >> >>> >>
> >> >>> >> What is the best medium to use for the documentation/comments ? A
> >> >>> >> shared gdoc or something similar?
> >> >>> >>
> >> >>> >> Luca
> >> >>> >>
> >> >>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org>
> wrote:
> >> >>> >> >
> >> >>> >> > One thing I think would be great to have is a doc version of
> the steps for upgrade and rollback. The benefits:
> >> >>> >> > 1. Anything unexpected happened during automation, you do have
> folks can quickly understand what's going on and get into the investigation.
> >> >>> >> > 2. Share the doc with us to help the others OSS users for
> doing the migration. For the env specific things I think that's fine. We
> can left comment on it. At least all the other users can get a high level
> view of a proven solution. And then they can go and find out the rest of
> the pieces by themselves.
> >> >>> >> >
> >> >>> >> > For automations, I suggest to split up the automation into
> several stages, and apply some validation steps(manually is ok) before
> kicking of the next stage.
> >> >>> >> >
> >> >>> >> > Best,
> >> >>> >> > Evans
> >> >>> >> >
> >> >>> >> >
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三
> 下午9:07寫道:
> >> >>> >> >>
> >> >>> >> >> Hi everybody,
> >> >>> >> >>
> >> >>> >> >> I didn't get the time to work on this until recently, but I
> finally
> >> >>> >> >> managed to have a reliable procedure to upgrade from CDH to
> Bigtop 1.4
> >> >>> >> >> and rollback if needed. The assumptions are:
> >> >>> >> >>
> >> >>> >> >> 1) It is ok to have (limited) cluster downtime.
> >> >>> >> >> 2) Rolling upgrade is not needed.
> >> >>> >> >> 3) QJM is used.
> >> >>> >> >>
> >> >>> >> >> The procedure is listed in these two scripts:
> >> >>> >> >>
> >> >>> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
> >> >>> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
> >> >>> >> >>
> >> >>> >> >> The code is highly dependent on my working environment, but
> it should
> >> >>> >> >> be clear to follow when writing a tutorial about how to
> migrate from
> >> >>> >> >> CDH to Bigtop. All the suggestions given by this mailing list
> were
> >> >>> >> >> really useful to reach a solution!
> >> >>> >> >>
> >> >>> >> >> My next steps will be:
> >> >>> >> >>
> >> >>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more
> hadoop
> >> >>> >> >> jobs, test Hive 2, etc..).
> >> >>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on
> Debian 9
> >> >>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
> >> >>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
> >> >>> >> >> 4) Upgrade to Debian 10.
> >> >>> >> >>
> >> >>> >> >> With automation it shouldn't be very difficult, I'll report
> progress once made.
> >> >>> >> >>
> >> >>> >> >> Thanks a lot!
> >> >>> >> >>
> >> >>> >> >> Luca
> >> >>> >> >>
> >> >>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <
> toscano.luca@gmail.com> wrote:
> >> >>> >> >> >
> >> >>> >> >> > Hi Evans,
> >> >>> >> >> >
> >> >>> >> >> > thanks a lot for the feedback, it was exactly what I
> needed. The
> >> >>> >> >> > simpler the better is definitely a good advice in this use
> case, I'll
> >> >>> >> >> > try this week another rollout/rollback and report back :)
> >> >>> >> >> >
> >> >>> >> >> > Luca
> >> >>> >> >> >
> >> >>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org>
> wrote:
> >> >>> >> >> > >
> >> >>> >> >> > > Hi Luca,
> >> >>> >> >> > >
> >> >>> >> >> > > Thanks for reporting back and let us know how it goes.
> >> >>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade
> experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and
> then enable QJM HA, which was back in 2014.
> >> >>> >> >> > >
> >> >>> >> >> > > Regarding to rollback, I think you're right:
> >> >>> >> >> > >
> >> >>> >> >> > > it is possible to rollback to HDFS’ state before the
> upgrade in case of unexpected problems.
> >> >>> >> >> > >
> >> >>> >> >> > > My previous experience is the same that the rollback is
> merely a snapshot before the upgrade. If you've gone far, then rollback
> cost more data lost... Our runbook is if our sanity check failed during
> upgrade downtime, we perform the rollback immediately.
> >> >>> >> >> > >
> >> >>> >> >> > > Regarding to that FSImage hole issue, I've experienced it
> as well.
> >> >>> >> >> > > I managed to fix it by manually edit the FSImage with
> offline image viewer[1] and delete that missing editLog in FSImage. That
> actually brought my cluster back with a little number of missing blocks.
> >> >>> >> >> > >
> >> >>> >> >> > > Our experience says that the more the steps, the more the
> chance you failed the upgrade. We did good on dozen times of testing, DEV
> cluster, STAGING cluster, but still got missing blocks when upgrading
> Production...
> >> >>> >> >> > >
> >> >>> >> >> > > The suggestion is to get your production in good shape
> first(the less decommissioned, offline DNs, disk failures, the better).
> >> >>> >> >> > > Also, maybe you can switch to non-HA mode and do the
> upgrade to simplify the things?
> >> >>> >> >> > >
> >> >>> >> >> > > Not many helps but please let us know if any progress.
> >> >>> >> >> > > Last one, have you reached out to Hadoop community? the
> authors should know the most :)
> >> >>> >> >> > >
> >> >>> >> >> > > - Evans
> >> >>> >> >> > >
> >> >>> >> >> > > [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
> >> >>> >> >> > >
> >> >>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三
> 21:03 寫道:
> >> >>> >> >> > >>
> >> >>> >> >> > >> Hi everybody,
> >> >>> >> >> > >>
> >> >>> >> >> > >> most of the bugs/issues/etc.. that I found while
> upgrading from CDH 5
> >> >>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested
> also in here)
> >> >>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
> >> >>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add
> documentation
> >> >>> >> >> > >> about this at the end I promise).
> >> >>> >> >> > >>
> >> >>> >> >> > >> I initially followed [1][2] in my Test cluster, choosing
> the Rolling
> >> >>> >> >> > >> upgrade, but when I tried to rollback (after days since
> the initial
> >> >>> >> >> > >> upgrade) I ended up in an inconsistent state and I
> wasn't able to
> >> >>> >> >> > >> recover the previous HDFS state. I didn't save the exact
> error
> >> >>> >> >> > >> messages but the situation was more or less the
> following:
> >> >>> >> >> > >>
> >> >>> >> >> > >> FS-Image-rollback (created at the time of the upgrade) -
> up to transaction X
> >> >>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X +
> 10000 (number
> >> >>> >> >> > >> totally made up for the example)
> >> >>> >> >> > >> QJM cluster: first available transaction Z = X + 10000 +
> 1
> >> >>> >> >> > >>
> >> >>> >> >> > >> When I tried to rolling rollback, the Namenode
> complained about a hole
> >> >>> >> >> > >> in the transaction log, namely at X + 1, so it refused
> to start. I
> >> >>> >> >> > >> tried to force a regular rollback, but the Namenode
> refused again
> >> >>> >> >> > >> saying that there was no available FS Image to roll back
> to. I checked
> >> >>> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs
> image with
> >> >>> >> >> > >> different naming/path in case of a rolling upgrade or a
> regular
> >> >>> >> >> > >> upgrade. Both cases make sense, especially the first one
> since there
> >> >>> >> >> > >> was indeed a hole between the last transaction of the
> >> >>> >> >> > >> FS-Image-rollback and the first available transaction to
> reply on the
> >> >>> >> >> > >> QJM cluster. I chose the rolling upgrade initially since
> it was
> >> >>> >> >> > >> appealing: it promises to bring back the Namenodes to
> their previous
> >> >>> >> >> > >> versions, but keeping the data modified between upgrade
> and rollback.
> >> >>> >> >> > >>
> >> >>> >> >> > >> I then found [3], in which it is said that with QJM
> everything is more
> >> >>> >> >> > >> complicated, and a regular rollback is the only option
> available. What
> >> >>> >> >> > >> I think this mean is that due to the Edit log spread
> among multiple
> >> >>> >> >> > >> nodes, a rollback that keeps data between upgrade and
> rollback is not
> >> >>> >> >> > >> available, so worst case scenario the data modified
> during that
> >> >>> >> >> > >> timeframe is lost. Not a big deal in my case, but I want
> to triple
> >> >>> >> >> > >> check with you if this is the correct interpretation or
> if there is
> >> >>> >> >> > >> another tutorial/guide/etc.. that I haven't read with a
> different
> >> >>> >> >> > >> procedure :)
> >> >>> >> >> > >>
> >> >>> >> >> > >> Is my interpretation correct? If not, is there anybody
> with experience
> >> >>> >> >> > >> in HDFS upgrades that could shed some light on the
> subject?
> >> >>> >> >> > >>
> >> >>> >> >> > >> Thanks in advance!
> >> >>> >> >> > >>
> >> >>> >> >> > >> Luca
> >> >>> >> >> > >>
> >> >>> >> >> > >>
> >> >>> >> >> > >>
> >> >>> >> >> > >> [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
> >> >>> >> >> > >> [2]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> >> >>> >> >> > >> [3]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled
>

Re: Testing rollback after HDFS upgrade

Posted by Luca Toscano <to...@gmail.com>.
Hi Evans,

I am late in answering as well :)

I thought about it and I think that with the right premises (example:
this is tailored for Wikimedia's environment, it assumes that a
cluster downtime is acceptable, etc..) the storytelling style might be
more easy to digest than a list of steps to follow. I think that in
all use cases different from Wikimedia there will be adjustments to
make, and things that work/don't-work/etc.. One thing that it might be
good to add at the end is a "summary of known pitfalls/bugs/etc.."
found during the procedure, that in my case were the most
time-consuming ones. I'll add it during the next few days and people
can comment :)

The Blog post would be a good idea, maybe something that we can share
between Wikimedia and Apache? I am planning to move to BigTop during
the upcoming quarter (October -> December), that will also show if my
procedure works on a cluster of 60+ nodes (rather than on a small one
of 8 nodes) :D. As soon as I have done it I'll follow up with this
list so organize a blog post, does it sound ok?

Thanks a lot for all the support!

Luca

On Tue, Sep 15, 2020 at 6:06 PM Evans Ye <ev...@apache.org> wrote:
>
> Hey Luca,
>
> Sorry for the late reply. I was busy for a conference. It's just over now.
> Anyway, I  think the writing is pretty informative. But it's more like a storytelling style. Also several contents are WikiMedia specific things. That's why I think it's more suitable for a blogpost.
>
> Anyhow, I think either way it's great content. If we keep it as is, I think we can make it available on Bigtop's WIKI & Blog, or even Success at Apache with the title like "WikiMedia's story to migrate from CDH to Bigtop". If you want to make it more like an official guide, the title will be "CDH to Bigtop Migration Guide". We can state the limitation  and environment so that people can take it w/ a caution that it might not suit their own environment.
>
> Which way to go depends on how much effort you'd like to take. Let me know what you think so that we can move forward.
>
> - Evans
>
> Luca Toscano <to...@gmail.com> 於 2020年9月7日 週一 下午3:39寫道:
>>
>> Hi Evans,
>>
>> thanks for the review! What are the things that you'd like to see to
>> make them more consumable for users? I can re-shape the writing, I
>> tried to come up with something to kick off a conversation with the
>> community, it would be interesting to know if anybody else has a
>> similar use case and how/if they are working on a solution.
>>
>> For the blogpost, maybe we can coordinate something shared between
>> Apache and Wikimedia when the migration is done, I am sure it would be
>> a nice example of the two Foundations collaborating :)
>>
>> Luca
>>
>> On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <ev...@apache.org> wrote:
>> >
>> > Hi Luca,
>> >
>> > I read through the doc briefly. I think the doc works very well as a blogpost of a successful story for Wikimedia migrating from CDH to Bigtop. However, the current writing doesn't seem to be easily consumable for users' who are just looking into the solutions/steps for doing similar migrations. May I know what title you would prefer if we put the doc in Bigtop's wiki?
>> >
>> > What I was thinking is the cookbook for migration. But we can discuss this. IMHO a Success at Apache[1] blogpost is also possible. But I need to figure out who to talk to. Let me know what you think.
>> >
>> > [1] https://blogs.apache.org/foundation/category/SuccessAtApache
>> >
>> > Evans
>> >
>> > Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:
>> >>
>> >> Hi Luca,
>> >>
>> >> I'm on vacation hence do not have time for review right now. I'll get back to you next week.
>> >>
>> >> The doc is definitely valuable. Once you have your production migrated successfully. We can prove to the other users that this is a battle proven solution. Even more, we can give a talk at ApacheCon or somewhere else to further amplify the impact of the work. This is definitely an open source winning case so I think it deserve a talk.
>> >>
>> >> Evans
>> >>
>> >>
>> >> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:
>> >>>
>> >>> Hi Evans,
>> >>>
>> >>> it took a while I know but I have the first version of the gdoc for the upgrade:
>> >>>
>> >>> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
>> >>>
>> >>> I tried to list all the steps involved in migrating from CDH 5 to
>> >>> Bigtop 1.4, anybody interested should be able to comment. The idea
>> >>> that I have is to discuss this for a few days and then possibly make
>> >>> it permanent somewhere in the Bigtop wiki? (of course if the document
>> >>> will be considered useful for others etc..)
>> >>>
>> >>> During these days I tested the procedure multiple times, and I have
>> >>> also tested the HDFS finalize step, everything works as expected. I
>> >>> hope to be able to move to Bigtop during the next couple of months.
>> >>>
>> >>> Luca
>> >>>
>> >>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org> wrote:
>> >>> >
>> >>> > Yes. I think a shared gdoc is prefered, and you can open up a JIRA ticket to track it.
>> >>> >
>> >>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
>> >>> >>
>> >>> >> Hi Evans!
>> >>> >>
>> >>> >> What is the best medium to use for the documentation/comments ? A
>> >>> >> shared gdoc or something similar?
>> >>> >>
>> >>> >> Luca
>> >>> >>
>> >>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
>> >>> >> >
>> >>> >> > One thing I think would be great to have is a doc version of the steps for upgrade and rollback. The benefits:
>> >>> >> > 1. Anything unexpected happened during automation, you do have folks can quickly understand what's going on and get into the investigation.
>> >>> >> > 2. Share the doc with us to help the others OSS users for doing the migration. For the env specific things I think that's fine. We can left comment on it. At least all the other users can get a high level view of a proven solution. And then they can go and find out the rest of the pieces by themselves.
>> >>> >> >
>> >>> >> > For automations, I suggest to split up the automation into several stages, and apply some validation steps(manually is ok) before kicking of the next stage.
>> >>> >> >
>> >>> >> > Best,
>> >>> >> > Evans
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
>> >>> >> >>
>> >>> >> >> Hi everybody,
>> >>> >> >>
>> >>> >> >> I didn't get the time to work on this until recently, but I finally
>> >>> >> >> managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
>> >>> >> >> and rollback if needed. The assumptions are:
>> >>> >> >>
>> >>> >> >> 1) It is ok to have (limited) cluster downtime.
>> >>> >> >> 2) Rolling upgrade is not needed.
>> >>> >> >> 3) QJM is used.
>> >>> >> >>
>> >>> >> >> The procedure is listed in these two scripts:
>> >>> >> >>
>> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>> >>> >> >>
>> >>> >> >> The code is highly dependent on my working environment, but it should
>> >>> >> >> be clear to follow when writing a tutorial about how to migrate from
>> >>> >> >> CDH to Bigtop. All the suggestions given by this mailing list were
>> >>> >> >> really useful to reach a solution!
>> >>> >> >>
>> >>> >> >> My next steps will be:
>> >>> >> >>
>> >>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
>> >>> >> >> jobs, test Hive 2, etc..).
>> >>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
>> >>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
>> >>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
>> >>> >> >> 4) Upgrade to Debian 10.
>> >>> >> >>
>> >>> >> >> With automation it shouldn't be very difficult, I'll report progress once made.
>> >>> >> >>
>> >>> >> >> Thanks a lot!
>> >>> >> >>
>> >>> >> >> Luca
>> >>> >> >>
>> >>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com> wrote:
>> >>> >> >> >
>> >>> >> >> > Hi Evans,
>> >>> >> >> >
>> >>> >> >> > thanks a lot for the feedback, it was exactly what I needed. The
>> >>> >> >> > simpler the better is definitely a good advice in this use case, I'll
>> >>> >> >> > try this week another rollout/rollback and report back :)
>> >>> >> >> >
>> >>> >> >> > Luca
>> >>> >> >> >
>> >>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
>> >>> >> >> > >
>> >>> >> >> > > Hi Luca,
>> >>> >> >> > >
>> >>> >> >> > > Thanks for reporting back and let us know how it goes.
>> >>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM HA, which was back in 2014.
>> >>> >> >> > >
>> >>> >> >> > > Regarding to rollback, I think you're right:
>> >>> >> >> > >
>> >>> >> >> > > it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
>> >>> >> >> > >
>> >>> >> >> > > My previous experience is the same that the rollback is merely a snapshot before the upgrade. If you've gone far, then rollback cost more data lost... Our runbook is if our sanity check failed during upgrade downtime, we perform the rollback immediately.
>> >>> >> >> > >
>> >>> >> >> > > Regarding to that FSImage hole issue, I've experienced it as well.
>> >>> >> >> > > I managed to fix it by manually edit the FSImage with offline image viewer[1] and delete that missing editLog in FSImage. That actually brought my cluster back with a little number of missing blocks.
>> >>> >> >> > >
>> >>> >> >> > > Our experience says that the more the steps, the more the chance you failed the upgrade. We did good on dozen times of testing, DEV cluster, STAGING cluster, but still got missing blocks when upgrading Production...
>> >>> >> >> > >
>> >>> >> >> > > The suggestion is to get your production in good shape first(the less decommissioned, offline DNs, disk failures, the better).
>> >>> >> >> > > Also, maybe you can switch to non-HA mode and do the upgrade to simplify the things?
>> >>> >> >> > >
>> >>> >> >> > > Not many helps but please let us know if any progress.
>> >>> >> >> > > Last one, have you reached out to Hadoop community? the authors should know the most :)
>> >>> >> >> > >
>> >>> >> >> > > - Evans
>> >>> >> >> > >
>> >>> >> >> > > [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
>> >>> >> >> > >
>> >>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
>> >>> >> >> > >>
>> >>> >> >> > >> Hi everybody,
>> >>> >> >> > >>
>> >>> >> >> > >> most of the bugs/issues/etc.. that I found while upgrading from CDH 5
>> >>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
>> >>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
>> >>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add documentation
>> >>> >> >> > >> about this at the end I promise).
>> >>> >> >> > >>
>> >>> >> >> > >> I initially followed [1][2] in my Test cluster, choosing the Rolling
>> >>> >> >> > >> upgrade, but when I tried to rollback (after days since the initial
>> >>> >> >> > >> upgrade) I ended up in an inconsistent state and I wasn't able to
>> >>> >> >> > >> recover the previous HDFS state. I didn't save the exact error
>> >>> >> >> > >> messages but the situation was more or less the following:
>> >>> >> >> > >>
>> >>> >> >> > >> FS-Image-rollback (created at the time of the upgrade) - up to transaction X
>> >>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
>> >>> >> >> > >> totally made up for the example)
>> >>> >> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
>> >>> >> >> > >>
>> >>> >> >> > >> When I tried to rolling rollback, the Namenode complained about a hole
>> >>> >> >> > >> in the transaction log, namely at X + 1, so it refused to start. I
>> >>> >> >> > >> tried to force a regular rollback, but the Namenode refused again
>> >>> >> >> > >> saying that there was no available FS Image to roll back to. I checked
>> >>> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs image with
>> >>> >> >> > >> different naming/path in case of a rolling upgrade or a regular
>> >>> >> >> > >> upgrade. Both cases make sense, especially the first one since there
>> >>> >> >> > >> was indeed a hole between the last transaction of the
>> >>> >> >> > >> FS-Image-rollback and the first available transaction to reply on the
>> >>> >> >> > >> QJM cluster. I chose the rolling upgrade initially since it was
>> >>> >> >> > >> appealing: it promises to bring back the Namenodes to their previous
>> >>> >> >> > >> versions, but keeping the data modified between upgrade and rollback.
>> >>> >> >> > >>
>> >>> >> >> > >> I then found [3], in which it is said that with QJM everything is more
>> >>> >> >> > >> complicated, and a regular rollback is the only option available. What
>> >>> >> >> > >> I think this mean is that due to the Edit log spread among multiple
>> >>> >> >> > >> nodes, a rollback that keeps data between upgrade and rollback is not
>> >>> >> >> > >> available, so worst case scenario the data modified during that
>> >>> >> >> > >> timeframe is lost. Not a big deal in my case, but I want to triple
>> >>> >> >> > >> check with you if this is the correct interpretation or if there is
>> >>> >> >> > >> another tutorial/guide/etc.. that I haven't read with a different
>> >>> >> >> > >> procedure :)
>> >>> >> >> > >>
>> >>> >> >> > >> Is my interpretation correct? If not, is there anybody with experience
>> >>> >> >> > >> in HDFS upgrades that could shed some light on the subject?
>> >>> >> >> > >>
>> >>> >> >> > >> Thanks in advance!
>> >>> >> >> > >>
>> >>> >> >> > >> Luca
>> >>> >> >> > >>
>> >>> >> >> > >>
>> >>> >> >> > >>
>> >>> >> >> > >> [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
>> >>> >> >> > >> [2] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>> >>> >> >> > >> [3] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Re: Testing rollback after HDFS upgrade

Posted by Evans Ye <ev...@apache.org>.
Hey Luca,

Sorry for the late reply. I was busy for a conference. It's just over now.
Anyway, I  think the writing is pretty informative. But it's more like a
storytelling style. Also several contents are WikiMedia specific things.
That's why I think it's more suitable for a blogpost.

Anyhow, I think either way it's great content. If we keep it as is, I think
we can make it available on Bigtop's WIKI & Blog, or even Success at
Apache with the title like "WikiMedia's story to migrate from CDH to
Bigtop". If you want to make it more like an official guide, the title will
be "CDH to Bigtop Migration Guide". We can state the limitation  and
environment so that people can take it w/ a caution that it might not suit
their own environment.

Which way to go depends on how much effort you'd like to take. Let me know
what you think so that we can move forward.

- Evans

Luca Toscano <to...@gmail.com> 於 2020年9月7日 週一 下午3:39寫道:

> Hi Evans,
>
> thanks for the review! What are the things that you'd like to see to
> make them more consumable for users? I can re-shape the writing, I
> tried to come up with something to kick off a conversation with the
> community, it would be interesting to know if anybody else has a
> similar use case and how/if they are working on a solution.
>
> For the blogpost, maybe we can coordinate something shared between
> Apache and Wikimedia when the migration is done, I am sure it would be
> a nice example of the two Foundations collaborating :)
>
> Luca
>
> On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <ev...@apache.org> wrote:
> >
> > Hi Luca,
> >
> > I read through the doc briefly. I think the doc works very well as a
> blogpost of a successful story for Wikimedia migrating from CDH to Bigtop.
> However, the current writing doesn't seem to be easily consumable for
> users' who are just looking into the solutions/steps for doing similar
> migrations. May I know what title you would prefer if we put the doc in
> Bigtop's wiki?
> >
> > What I was thinking is the cookbook for migration. But we can discuss
> this. IMHO a Success at Apache[1] blogpost is also possible. But I need to
> figure out who to talk to. Let me know what you think.
> >
> > [1] https://blogs.apache.org/foundation/category/SuccessAtApache
> >
> > Evans
> >
> > Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:
> >>
> >> Hi Luca,
> >>
> >> I'm on vacation hence do not have time for review right now. I'll get
> back to you next week.
> >>
> >> The doc is definitely valuable. Once you have your production migrated
> successfully. We can prove to the other users that this is a battle proven
> solution. Even more, we can give a talk at ApacheCon or somewhere else to
> further amplify the impact of the work. This is definitely an open source
> winning case so I think it deserve a talk.
> >>
> >> Evans
> >>
> >>
> >> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:
> >>>
> >>> Hi Evans,
> >>>
> >>> it took a while I know but I have the first version of the gdoc for
> the upgrade:
> >>>
> >>>
> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
> >>>
> >>> I tried to list all the steps involved in migrating from CDH 5 to
> >>> Bigtop 1.4, anybody interested should be able to comment. The idea
> >>> that I have is to discuss this for a few days and then possibly make
> >>> it permanent somewhere in the Bigtop wiki? (of course if the document
> >>> will be considered useful for others etc..)
> >>>
> >>> During these days I tested the procedure multiple times, and I have
> >>> also tested the HDFS finalize step, everything works as expected. I
> >>> hope to be able to move to Bigtop during the next couple of months.
> >>>
> >>> Luca
> >>>
> >>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org> wrote:
> >>> >
> >>> > Yes. I think a shared gdoc is prefered, and you can open up a JIRA
> ticket to track it.
> >>> >
> >>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
> >>> >>
> >>> >> Hi Evans!
> >>> >>
> >>> >> What is the best medium to use for the documentation/comments ? A
> >>> >> shared gdoc or something similar?
> >>> >>
> >>> >> Luca
> >>> >>
> >>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org>
> wrote:
> >>> >> >
> >>> >> > One thing I think would be great to have is a doc version of the
> steps for upgrade and rollback. The benefits:
> >>> >> > 1. Anything unexpected happened during automation, you do have
> folks can quickly understand what's going on and get into the investigation.
> >>> >> > 2. Share the doc with us to help the others OSS users for doing
> the migration. For the env specific things I think that's fine. We can left
> comment on it. At least all the other users can get a high level view of a
> proven solution. And then they can go and find out the rest of the pieces
> by themselves.
> >>> >> >
> >>> >> > For automations, I suggest to split up the automation into
> several stages, and apply some validation steps(manually is ok) before
> kicking of the next stage.
> >>> >> >
> >>> >> > Best,
> >>> >> > Evans
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
> >>> >> >>
> >>> >> >> Hi everybody,
> >>> >> >>
> >>> >> >> I didn't get the time to work on this until recently, but I
> finally
> >>> >> >> managed to have a reliable procedure to upgrade from CDH to
> Bigtop 1.4
> >>> >> >> and rollback if needed. The assumptions are:
> >>> >> >>
> >>> >> >> 1) It is ok to have (limited) cluster downtime.
> >>> >> >> 2) Rolling upgrade is not needed.
> >>> >> >> 3) QJM is used.
> >>> >> >>
> >>> >> >> The procedure is listed in these two scripts:
> >>> >> >>
> >>> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
> >>> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
> >>> >> >>
> >>> >> >> The code is highly dependent on my working environment, but it
> should
> >>> >> >> be clear to follow when writing a tutorial about how to migrate
> from
> >>> >> >> CDH to Bigtop. All the suggestions given by this mailing list
> were
> >>> >> >> really useful to reach a solution!
> >>> >> >>
> >>> >> >> My next steps will be:
> >>> >> >>
> >>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more
> hadoop
> >>> >> >> jobs, test Hive 2, etc..).
> >>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian
> 9
> >>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
> >>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
> >>> >> >> 4) Upgrade to Debian 10.
> >>> >> >>
> >>> >> >> With automation it shouldn't be very difficult, I'll report
> progress once made.
> >>> >> >>
> >>> >> >> Thanks a lot!
> >>> >> >>
> >>> >> >> Luca
> >>> >> >>
> >>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <
> toscano.luca@gmail.com> wrote:
> >>> >> >> >
> >>> >> >> > Hi Evans,
> >>> >> >> >
> >>> >> >> > thanks a lot for the feedback, it was exactly what I needed.
> The
> >>> >> >> > simpler the better is definitely a good advice in this use
> case, I'll
> >>> >> >> > try this week another rollout/rollback and report back :)
> >>> >> >> >
> >>> >> >> > Luca
> >>> >> >> >
> >>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org>
> wrote:
> >>> >> >> > >
> >>> >> >> > > Hi Luca,
> >>> >> >> > >
> >>> >> >> > > Thanks for reporting back and let us know how it goes.
> >>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade
> experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and
> then enable QJM HA, which was back in 2014.
> >>> >> >> > >
> >>> >> >> > > Regarding to rollback, I think you're right:
> >>> >> >> > >
> >>> >> >> > > it is possible to rollback to HDFS’ state before the upgrade
> in case of unexpected problems.
> >>> >> >> > >
> >>> >> >> > > My previous experience is the same that the rollback is
> merely a snapshot before the upgrade. If you've gone far, then rollback
> cost more data lost... Our runbook is if our sanity check failed during
> upgrade downtime, we perform the rollback immediately.
> >>> >> >> > >
> >>> >> >> > > Regarding to that FSImage hole issue, I've experienced it as
> well.
> >>> >> >> > > I managed to fix it by manually edit the FSImage with
> offline image viewer[1] and delete that missing editLog in FSImage. That
> actually brought my cluster back with a little number of missing blocks.
> >>> >> >> > >
> >>> >> >> > > Our experience says that the more the steps, the more the
> chance you failed the upgrade. We did good on dozen times of testing, DEV
> cluster, STAGING cluster, but still got missing blocks when upgrading
> Production...
> >>> >> >> > >
> >>> >> >> > > The suggestion is to get your production in good shape
> first(the less decommissioned, offline DNs, disk failures, the better).
> >>> >> >> > > Also, maybe you can switch to non-HA mode and do the upgrade
> to simplify the things?
> >>> >> >> > >
> >>> >> >> > > Not many helps but please let us know if any progress.
> >>> >> >> > > Last one, have you reached out to Hadoop community? the
> authors should know the most :)
> >>> >> >> > >
> >>> >> >> > > - Evans
> >>> >> >> > >
> >>> >> >> > > [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
> >>> >> >> > >
> >>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03
> 寫道:
> >>> >> >> > >>
> >>> >> >> > >> Hi everybody,
> >>> >> >> > >>
> >>> >> >> > >> most of the bugs/issues/etc.. that I found while upgrading
> from CDH 5
> >>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested
> also in here)
> >>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
> >>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add
> documentation
> >>> >> >> > >> about this at the end I promise).
> >>> >> >> > >>
> >>> >> >> > >> I initially followed [1][2] in my Test cluster, choosing
> the Rolling
> >>> >> >> > >> upgrade, but when I tried to rollback (after days since the
> initial
> >>> >> >> > >> upgrade) I ended up in an inconsistent state and I wasn't
> able to
> >>> >> >> > >> recover the previous HDFS state. I didn't save the exact
> error
> >>> >> >> > >> messages but the situation was more or less the following:
> >>> >> >> > >>
> >>> >> >> > >> FS-Image-rollback (created at the time of the upgrade) - up
> to transaction X
> >>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000
> (number
> >>> >> >> > >> totally made up for the example)
> >>> >> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
> >>> >> >> > >>
> >>> >> >> > >> When I tried to rolling rollback, the Namenode complained
> about a hole
> >>> >> >> > >> in the transaction log, namely at X + 1, so it refused to
> start. I
> >>> >> >> > >> tried to force a regular rollback, but the Namenode refused
> again
> >>> >> >> > >> saying that there was no available FS Image to roll back
> to. I checked
> >>> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs
> image with
> >>> >> >> > >> different naming/path in case of a rolling upgrade or a
> regular
> >>> >> >> > >> upgrade. Both cases make sense, especially the first one
> since there
> >>> >> >> > >> was indeed a hole between the last transaction of the
> >>> >> >> > >> FS-Image-rollback and the first available transaction to
> reply on the
> >>> >> >> > >> QJM cluster. I chose the rolling upgrade initially since it
> was
> >>> >> >> > >> appealing: it promises to bring back the Namenodes to their
> previous
> >>> >> >> > >> versions, but keeping the data modified between upgrade and
> rollback.
> >>> >> >> > >>
> >>> >> >> > >> I then found [3], in which it is said that with QJM
> everything is more
> >>> >> >> > >> complicated, and a regular rollback is the only option
> available. What
> >>> >> >> > >> I think this mean is that due to the Edit log spread among
> multiple
> >>> >> >> > >> nodes, a rollback that keeps data between upgrade and
> rollback is not
> >>> >> >> > >> available, so worst case scenario the data modified during
> that
> >>> >> >> > >> timeframe is lost. Not a big deal in my case, but I want to
> triple
> >>> >> >> > >> check with you if this is the correct interpretation or if
> there is
> >>> >> >> > >> another tutorial/guide/etc.. that I haven't read with a
> different
> >>> >> >> > >> procedure :)
> >>> >> >> > >>
> >>> >> >> > >> Is my interpretation correct? If not, is there anybody with
> experience
> >>> >> >> > >> in HDFS upgrades that could shed some light on the subject?
> >>> >> >> > >>
> >>> >> >> > >> Thanks in advance!
> >>> >> >> > >>
> >>> >> >> > >> Luca
> >>> >> >> > >>
> >>> >> >> > >>
> >>> >> >> > >>
> >>> >> >> > >> [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
> >>> >> >> > >> [2]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> >>> >> >> > >> [3]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled
>

Re: Testing rollback after HDFS upgrade

Posted by Luca Toscano <to...@gmail.com>.
Hi Evans,

thanks for the review! What are the things that you'd like to see to
make them more consumable for users? I can re-shape the writing, I
tried to come up with something to kick off a conversation with the
community, it would be interesting to know if anybody else has a
similar use case and how/if they are working on a solution.

For the blogpost, maybe we can coordinate something shared between
Apache and Wikimedia when the migration is done, I am sure it would be
a nice example of the two Foundations collaborating :)

Luca

On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <ev...@apache.org> wrote:
>
> Hi Luca,
>
> I read through the doc briefly. I think the doc works very well as a blogpost of a successful story for Wikimedia migrating from CDH to Bigtop. However, the current writing doesn't seem to be easily consumable for users' who are just looking into the solutions/steps for doing similar migrations. May I know what title you would prefer if we put the doc in Bigtop's wiki?
>
> What I was thinking is the cookbook for migration. But we can discuss this. IMHO a Success at Apache[1] blogpost is also possible. But I need to figure out who to talk to. Let me know what you think.
>
> [1] https://blogs.apache.org/foundation/category/SuccessAtApache
>
> Evans
>
> Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:
>>
>> Hi Luca,
>>
>> I'm on vacation hence do not have time for review right now. I'll get back to you next week.
>>
>> The doc is definitely valuable. Once you have your production migrated successfully. We can prove to the other users that this is a battle proven solution. Even more, we can give a talk at ApacheCon or somewhere else to further amplify the impact of the work. This is definitely an open source winning case so I think it deserve a talk.
>>
>> Evans
>>
>>
>> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:
>>>
>>> Hi Evans,
>>>
>>> it took a while I know but I have the first version of the gdoc for the upgrade:
>>>
>>> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
>>>
>>> I tried to list all the steps involved in migrating from CDH 5 to
>>> Bigtop 1.4, anybody interested should be able to comment. The idea
>>> that I have is to discuss this for a few days and then possibly make
>>> it permanent somewhere in the Bigtop wiki? (of course if the document
>>> will be considered useful for others etc..)
>>>
>>> During these days I tested the procedure multiple times, and I have
>>> also tested the HDFS finalize step, everything works as expected. I
>>> hope to be able to move to Bigtop during the next couple of months.
>>>
>>> Luca
>>>
>>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org> wrote:
>>> >
>>> > Yes. I think a shared gdoc is prefered, and you can open up a JIRA ticket to track it.
>>> >
>>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
>>> >>
>>> >> Hi Evans!
>>> >>
>>> >> What is the best medium to use for the documentation/comments ? A
>>> >> shared gdoc or something similar?
>>> >>
>>> >> Luca
>>> >>
>>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
>>> >> >
>>> >> > One thing I think would be great to have is a doc version of the steps for upgrade and rollback. The benefits:
>>> >> > 1. Anything unexpected happened during automation, you do have folks can quickly understand what's going on and get into the investigation.
>>> >> > 2. Share the doc with us to help the others OSS users for doing the migration. For the env specific things I think that's fine. We can left comment on it. At least all the other users can get a high level view of a proven solution. And then they can go and find out the rest of the pieces by themselves.
>>> >> >
>>> >> > For automations, I suggest to split up the automation into several stages, and apply some validation steps(manually is ok) before kicking of the next stage.
>>> >> >
>>> >> > Best,
>>> >> > Evans
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
>>> >> >>
>>> >> >> Hi everybody,
>>> >> >>
>>> >> >> I didn't get the time to work on this until recently, but I finally
>>> >> >> managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
>>> >> >> and rollback if needed. The assumptions are:
>>> >> >>
>>> >> >> 1) It is ok to have (limited) cluster downtime.
>>> >> >> 2) Rolling upgrade is not needed.
>>> >> >> 3) QJM is used.
>>> >> >>
>>> >> >> The procedure is listed in these two scripts:
>>> >> >>
>>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>>> >> >>
>>> >> >> The code is highly dependent on my working environment, but it should
>>> >> >> be clear to follow when writing a tutorial about how to migrate from
>>> >> >> CDH to Bigtop. All the suggestions given by this mailing list were
>>> >> >> really useful to reach a solution!
>>> >> >>
>>> >> >> My next steps will be:
>>> >> >>
>>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
>>> >> >> jobs, test Hive 2, etc..).
>>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
>>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
>>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
>>> >> >> 4) Upgrade to Debian 10.
>>> >> >>
>>> >> >> With automation it shouldn't be very difficult, I'll report progress once made.
>>> >> >>
>>> >> >> Thanks a lot!
>>> >> >>
>>> >> >> Luca
>>> >> >>
>>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com> wrote:
>>> >> >> >
>>> >> >> > Hi Evans,
>>> >> >> >
>>> >> >> > thanks a lot for the feedback, it was exactly what I needed. The
>>> >> >> > simpler the better is definitely a good advice in this use case, I'll
>>> >> >> > try this week another rollout/rollback and report back :)
>>> >> >> >
>>> >> >> > Luca
>>> >> >> >
>>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
>>> >> >> > >
>>> >> >> > > Hi Luca,
>>> >> >> > >
>>> >> >> > > Thanks for reporting back and let us know how it goes.
>>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM HA, which was back in 2014.
>>> >> >> > >
>>> >> >> > > Regarding to rollback, I think you're right:
>>> >> >> > >
>>> >> >> > > it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
>>> >> >> > >
>>> >> >> > > My previous experience is the same that the rollback is merely a snapshot before the upgrade. If you've gone far, then rollback cost more data lost... Our runbook is if our sanity check failed during upgrade downtime, we perform the rollback immediately.
>>> >> >> > >
>>> >> >> > > Regarding to that FSImage hole issue, I've experienced it as well.
>>> >> >> > > I managed to fix it by manually edit the FSImage with offline image viewer[1] and delete that missing editLog in FSImage. That actually brought my cluster back with a little number of missing blocks.
>>> >> >> > >
>>> >> >> > > Our experience says that the more the steps, the more the chance you failed the upgrade. We did good on dozen times of testing, DEV cluster, STAGING cluster, but still got missing blocks when upgrading Production...
>>> >> >> > >
>>> >> >> > > The suggestion is to get your production in good shape first(the less decommissioned, offline DNs, disk failures, the better).
>>> >> >> > > Also, maybe you can switch to non-HA mode and do the upgrade to simplify the things?
>>> >> >> > >
>>> >> >> > > Not many helps but please let us know if any progress.
>>> >> >> > > Last one, have you reached out to Hadoop community? the authors should know the most :)
>>> >> >> > >
>>> >> >> > > - Evans
>>> >> >> > >
>>> >> >> > > [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
>>> >> >> > >
>>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
>>> >> >> > >>
>>> >> >> > >> Hi everybody,
>>> >> >> > >>
>>> >> >> > >> most of the bugs/issues/etc.. that I found while upgrading from CDH 5
>>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
>>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
>>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add documentation
>>> >> >> > >> about this at the end I promise).
>>> >> >> > >>
>>> >> >> > >> I initially followed [1][2] in my Test cluster, choosing the Rolling
>>> >> >> > >> upgrade, but when I tried to rollback (after days since the initial
>>> >> >> > >> upgrade) I ended up in an inconsistent state and I wasn't able to
>>> >> >> > >> recover the previous HDFS state. I didn't save the exact error
>>> >> >> > >> messages but the situation was more or less the following:
>>> >> >> > >>
>>> >> >> > >> FS-Image-rollback (created at the time of the upgrade) - up to transaction X
>>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
>>> >> >> > >> totally made up for the example)
>>> >> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
>>> >> >> > >>
>>> >> >> > >> When I tried to rolling rollback, the Namenode complained about a hole
>>> >> >> > >> in the transaction log, namely at X + 1, so it refused to start. I
>>> >> >> > >> tried to force a regular rollback, but the Namenode refused again
>>> >> >> > >> saying that there was no available FS Image to roll back to. I checked
>>> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs image with
>>> >> >> > >> different naming/path in case of a rolling upgrade or a regular
>>> >> >> > >> upgrade. Both cases make sense, especially the first one since there
>>> >> >> > >> was indeed a hole between the last transaction of the
>>> >> >> > >> FS-Image-rollback and the first available transaction to reply on the
>>> >> >> > >> QJM cluster. I chose the rolling upgrade initially since it was
>>> >> >> > >> appealing: it promises to bring back the Namenodes to their previous
>>> >> >> > >> versions, but keeping the data modified between upgrade and rollback.
>>> >> >> > >>
>>> >> >> > >> I then found [3], in which it is said that with QJM everything is more
>>> >> >> > >> complicated, and a regular rollback is the only option available. What
>>> >> >> > >> I think this mean is that due to the Edit log spread among multiple
>>> >> >> > >> nodes, a rollback that keeps data between upgrade and rollback is not
>>> >> >> > >> available, so worst case scenario the data modified during that
>>> >> >> > >> timeframe is lost. Not a big deal in my case, but I want to triple
>>> >> >> > >> check with you if this is the correct interpretation or if there is
>>> >> >> > >> another tutorial/guide/etc.. that I haven't read with a different
>>> >> >> > >> procedure :)
>>> >> >> > >>
>>> >> >> > >> Is my interpretation correct? If not, is there anybody with experience
>>> >> >> > >> in HDFS upgrades that could shed some light on the subject?
>>> >> >> > >>
>>> >> >> > >> Thanks in advance!
>>> >> >> > >>
>>> >> >> > >> Luca
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >> [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
>>> >> >> > >> [2] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>>> >> >> > >> [3] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Re: Testing rollback after HDFS upgrade

Posted by Evans Ye <ev...@apache.org>.
Hi Luca,

I read through the doc briefly. I think the doc works very well as a
blogpost of a successful story for Wikimedia migrating from CDH to Bigtop.
However, the current writing doesn't seem to be easily consumable for
users' who are just looking into the solutions/steps for doing similar
migrations. May I know what title you would prefer if we put the doc in
Bigtop's wiki?

What I was thinking is the cookbook for migration. But we can discuss this.
IMHO a Success at Apache[1] blogpost is also possible. But I need to figure
out who to talk to. Let me know what you think.

[1] https://blogs.apache.org/foundation/category/SuccessAtApache

Evans

Evans Ye <ev...@apache.org> 於 2020年8月30日 週日 上午3:18寫道:

> Hi Luca,
>
> I'm on vacation hence do not have time for review right now. I'll get back
> to you next week.
>
> The doc is definitely valuable. Once you have your production migrated
> successfully. We can prove to the other users that this is a battle proven
> solution. Even more, we can give a talk at ApacheCon or somewhere else to
> further amplify the impact of the work. This is definitely an open source
> winning case so I think it deserve a talk.
>
> Evans
>
>
> Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:
>
>> Hi Evans,
>>
>> it took a while I know but I have the first version of the gdoc for the
>> upgrade:
>>
>>
>> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
>>
>> I tried to list all the steps involved in migrating from CDH 5 to
>> Bigtop 1.4, anybody interested should be able to comment. The idea
>> that I have is to discuss this for a few days and then possibly make
>> it permanent somewhere in the Bigtop wiki? (of course if the document
>> will be considered useful for others etc..)
>>
>> During these days I tested the procedure multiple times, and I have
>> also tested the HDFS finalize step, everything works as expected. I
>> hope to be able to move to Bigtop during the next couple of months.
>>
>> Luca
>>
>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org> wrote:
>> >
>> > Yes. I think a shared gdoc is prefered, and you can open up a JIRA
>> ticket to track it.
>> >
>> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
>> >>
>> >> Hi Evans!
>> >>
>> >> What is the best medium to use for the documentation/comments ? A
>> >> shared gdoc or something similar?
>> >>
>> >> Luca
>> >>
>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
>> >> >
>> >> > One thing I think would be great to have is a doc version of the
>> steps for upgrade and rollback. The benefits:
>> >> > 1. Anything unexpected happened during automation, you do have folks
>> can quickly understand what's going on and get into the investigation.
>> >> > 2. Share the doc with us to help the others OSS users for doing the
>> migration. For the env specific things I think that's fine. We can left
>> comment on it. At least all the other users can get a high level view of a
>> proven solution. And then they can go and find out the rest of the pieces
>> by themselves.
>> >> >
>> >> > For automations, I suggest to split up the automation into several
>> stages, and apply some validation steps(manually is ok) before kicking of
>> the next stage.
>> >> >
>> >> > Best,
>> >> > Evans
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
>> >> >>
>> >> >> Hi everybody,
>> >> >>
>> >> >> I didn't get the time to work on this until recently, but I finally
>> >> >> managed to have a reliable procedure to upgrade from CDH to Bigtop
>> 1.4
>> >> >> and rollback if needed. The assumptions are:
>> >> >>
>> >> >> 1) It is ok to have (limited) cluster downtime.
>> >> >> 2) Rolling upgrade is not needed.
>> >> >> 3) QJM is used.
>> >> >>
>> >> >> The procedure is listed in these two scripts:
>> >> >>
>> >> >>
>> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>> >> >>
>> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>> >> >>
>> >> >> The code is highly dependent on my working environment, but it
>> should
>> >> >> be clear to follow when writing a tutorial about how to migrate from
>> >> >> CDH to Bigtop. All the suggestions given by this mailing list were
>> >> >> really useful to reach a solution!
>> >> >>
>> >> >> My next steps will be:
>> >> >>
>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
>> >> >> jobs, test Hive 2, etc..).
>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
>> >> >> 4) Upgrade to Debian 10.
>> >> >>
>> >> >> With automation it shouldn't be very difficult, I'll report
>> progress once made.
>> >> >>
>> >> >> Thanks a lot!
>> >> >>
>> >> >> Luca
>> >> >>
>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <
>> toscano.luca@gmail.com> wrote:
>> >> >> >
>> >> >> > Hi Evans,
>> >> >> >
>> >> >> > thanks a lot for the feedback, it was exactly what I needed. The
>> >> >> > simpler the better is definitely a good advice in this use case,
>> I'll
>> >> >> > try this week another rollout/rollback and report back :)
>> >> >> >
>> >> >> > Luca
>> >> >> >
>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org>
>> wrote:
>> >> >> > >
>> >> >> > > Hi Luca,
>> >> >> > >
>> >> >> > > Thanks for reporting back and let us know how it goes.
>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade experience.
>> The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable
>> QJM HA, which was back in 2014.
>> >> >> > >
>> >> >> > > Regarding to rollback, I think you're right:
>> >> >> > >
>> >> >> > > it is possible to rollback to HDFS’ state before the upgrade in
>> case of unexpected problems.
>> >> >> > >
>> >> >> > > My previous experience is the same that the rollback is merely
>> a snapshot before the upgrade. If you've gone far, then rollback cost more
>> data lost... Our runbook is if our sanity check failed during upgrade
>> downtime, we perform the rollback immediately.
>> >> >> > >
>> >> >> > > Regarding to that FSImage hole issue, I've experienced it as
>> well.
>> >> >> > > I managed to fix it by manually edit the FSImage with offline
>> image viewer[1] and delete that missing editLog in FSImage. That actually
>> brought my cluster back with a little number of missing blocks.
>> >> >> > >
>> >> >> > > Our experience says that the more the steps, the more the
>> chance you failed the upgrade. We did good on dozen times of testing, DEV
>> cluster, STAGING cluster, but still got missing blocks when upgrading
>> Production...
>> >> >> > >
>> >> >> > > The suggestion is to get your production in good shape
>> first(the less decommissioned, offline DNs, disk failures, the better).
>> >> >> > > Also, maybe you can switch to non-HA mode and do the upgrade to
>> simplify the things?
>> >> >> > >
>> >> >> > > Not many helps but please let us know if any progress.
>> >> >> > > Last one, have you reached out to Hadoop community? the authors
>> should know the most :)
>> >> >> > >
>> >> >> > > - Evans
>> >> >> > >
>> >> >> > > [1]
>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
>> >> >> > >
>> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
>> >> >> > >>
>> >> >> > >> Hi everybody,
>> >> >> > >>
>> >> >> > >> most of the bugs/issues/etc.. that I found while upgrading
>> from CDH 5
>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also
>> in here)
>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add
>> documentation
>> >> >> > >> about this at the end I promise).
>> >> >> > >>
>> >> >> > >> I initially followed [1][2] in my Test cluster, choosing the
>> Rolling
>> >> >> > >> upgrade, but when I tried to rollback (after days since the
>> initial
>> >> >> > >> upgrade) I ended up in an inconsistent state and I wasn't able
>> to
>> >> >> > >> recover the previous HDFS state. I didn't save the exact error
>> >> >> > >> messages but the situation was more or less the following:
>> >> >> > >>
>> >> >> > >> FS-Image-rollback (created at the time of the upgrade) - up to
>> transaction X
>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000
>> (number
>> >> >> > >> totally made up for the example)
>> >> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
>> >> >> > >>
>> >> >> > >> When I tried to rolling rollback, the Namenode complained
>> about a hole
>> >> >> > >> in the transaction log, namely at X + 1, so it refused to
>> start. I
>> >> >> > >> tried to force a regular rollback, but the Namenode refused
>> again
>> >> >> > >> saying that there was no available FS Image to roll back to. I
>> checked
>> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs image
>> with
>> >> >> > >> different naming/path in case of a rolling upgrade or a regular
>> >> >> > >> upgrade. Both cases make sense, especially the first one since
>> there
>> >> >> > >> was indeed a hole between the last transaction of the
>> >> >> > >> FS-Image-rollback and the first available transaction to reply
>> on the
>> >> >> > >> QJM cluster. I chose the rolling upgrade initially since it was
>> >> >> > >> appealing: it promises to bring back the Namenodes to their
>> previous
>> >> >> > >> versions, but keeping the data modified between upgrade and
>> rollback.
>> >> >> > >>
>> >> >> > >> I then found [3], in which it is said that with QJM everything
>> is more
>> >> >> > >> complicated, and a regular rollback is the only option
>> available. What
>> >> >> > >> I think this mean is that due to the Edit log spread among
>> multiple
>> >> >> > >> nodes, a rollback that keeps data between upgrade and rollback
>> is not
>> >> >> > >> available, so worst case scenario the data modified during that
>> >> >> > >> timeframe is lost. Not a big deal in my case, but I want to
>> triple
>> >> >> > >> check with you if this is the correct interpretation or if
>> there is
>> >> >> > >> another tutorial/guide/etc.. that I haven't read with a
>> different
>> >> >> > >> procedure :)
>> >> >> > >>
>> >> >> > >> Is my interpretation correct? If not, is there anybody with
>> experience
>> >> >> > >> in HDFS upgrades that could shed some light on the subject?
>> >> >> > >>
>> >> >> > >> Thanks in advance!
>> >> >> > >>
>> >> >> > >> Luca
>> >> >> > >>
>> >> >> > >>
>> >> >> > >>
>> >> >> > >> [1]
>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
>> >> >> > >> [2]
>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>> >> >> > >> [3]
>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled
>>
>

Re: Testing rollback after HDFS upgrade

Posted by Evans Ye <ev...@apache.org>.
Hi Luca,

I'm on vacation hence do not have time for review right now. I'll get back
to you next week.

The doc is definitely valuable. Once you have your production migrated
successfully. We can prove to the other users that this is a battle proven
solution. Even more, we can give a talk at ApacheCon or somewhere else to
further amplify the impact of the work. This is definitely an open source
winning case so I think it deserve a talk.

Evans


Luca Toscano <to...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道:

> Hi Evans,
>
> it took a while I know but I have the first version of the gdoc for the
> upgrade:
>
>
> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
>
> I tried to list all the steps involved in migrating from CDH 5 to
> Bigtop 1.4, anybody interested should be able to comment. The idea
> that I have is to discuss this for a few days and then possibly make
> it permanent somewhere in the Bigtop wiki? (of course if the document
> will be considered useful for others etc..)
>
> During these days I tested the procedure multiple times, and I have
> also tested the HDFS finalize step, everything works as expected. I
> hope to be able to move to Bigtop during the next couple of months.
>
> Luca
>
> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org> wrote:
> >
> > Yes. I think a shared gdoc is prefered, and you can open up a JIRA
> ticket to track it.
> >
> > Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
> >>
> >> Hi Evans!
> >>
> >> What is the best medium to use for the documentation/comments ? A
> >> shared gdoc or something similar?
> >>
> >> Luca
> >>
> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
> >> >
> >> > One thing I think would be great to have is a doc version of the
> steps for upgrade and rollback. The benefits:
> >> > 1. Anything unexpected happened during automation, you do have folks
> can quickly understand what's going on and get into the investigation.
> >> > 2. Share the doc with us to help the others OSS users for doing the
> migration. For the env specific things I think that's fine. We can left
> comment on it. At least all the other users can get a high level view of a
> proven solution. And then they can go and find out the rest of the pieces
> by themselves.
> >> >
> >> > For automations, I suggest to split up the automation into several
> stages, and apply some validation steps(manually is ok) before kicking of
> the next stage.
> >> >
> >> > Best,
> >> > Evans
> >> >
> >> >
> >> >
> >> >
> >> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
> >> >>
> >> >> Hi everybody,
> >> >>
> >> >> I didn't get the time to work on this until recently, but I finally
> >> >> managed to have a reliable procedure to upgrade from CDH to Bigtop
> 1.4
> >> >> and rollback if needed. The assumptions are:
> >> >>
> >> >> 1) It is ok to have (limited) cluster downtime.
> >> >> 2) Rolling upgrade is not needed.
> >> >> 3) QJM is used.
> >> >>
> >> >> The procedure is listed in these two scripts:
> >> >>
> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
> >> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
> >> >>
> >> >> The code is highly dependent on my working environment, but it should
> >> >> be clear to follow when writing a tutorial about how to migrate from
> >> >> CDH to Bigtop. All the suggestions given by this mailing list were
> >> >> really useful to reach a solution!
> >> >>
> >> >> My next steps will be:
> >> >>
> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
> >> >> jobs, test Hive 2, etc..).
> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
> >> >> 4) Upgrade to Debian 10.
> >> >>
> >> >> With automation it shouldn't be very difficult, I'll report progress
> once made.
> >> >>
> >> >> Thanks a lot!
> >> >>
> >> >> Luca
> >> >>
> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com>
> wrote:
> >> >> >
> >> >> > Hi Evans,
> >> >> >
> >> >> > thanks a lot for the feedback, it was exactly what I needed. The
> >> >> > simpler the better is definitely a good advice in this use case,
> I'll
> >> >> > try this week another rollout/rollback and report back :)
> >> >> >
> >> >> > Luca
> >> >> >
> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org>
> wrote:
> >> >> > >
> >> >> > > Hi Luca,
> >> >> > >
> >> >> > > Thanks for reporting back and let us know how it goes.
> >> >> > > I don't have the exactly HDFS with QJM HA upgrade experience.
> The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable
> QJM HA, which was back in 2014.
> >> >> > >
> >> >> > > Regarding to rollback, I think you're right:
> >> >> > >
> >> >> > > it is possible to rollback to HDFS’ state before the upgrade in
> case of unexpected problems.
> >> >> > >
> >> >> > > My previous experience is the same that the rollback is merely a
> snapshot before the upgrade. If you've gone far, then rollback cost more
> data lost... Our runbook is if our sanity check failed during upgrade
> downtime, we perform the rollback immediately.
> >> >> > >
> >> >> > > Regarding to that FSImage hole issue, I've experienced it as
> well.
> >> >> > > I managed to fix it by manually edit the FSImage with offline
> image viewer[1] and delete that missing editLog in FSImage. That actually
> brought my cluster back with a little number of missing blocks.
> >> >> > >
> >> >> > > Our experience says that the more the steps, the more the chance
> you failed the upgrade. We did good on dozen times of testing, DEV cluster,
> STAGING cluster, but still got missing blocks when upgrading Production...
> >> >> > >
> >> >> > > The suggestion is to get your production in good shape first(the
> less decommissioned, offline DNs, disk failures, the better).
> >> >> > > Also, maybe you can switch to non-HA mode and do the upgrade to
> simplify the things?
> >> >> > >
> >> >> > > Not many helps but please let us know if any progress.
> >> >> > > Last one, have you reached out to Hadoop community? the authors
> should know the most :)
> >> >> > >
> >> >> > > - Evans
> >> >> > >
> >> >> > > [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
> >> >> > >
> >> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
> >> >> > >>
> >> >> > >> Hi everybody,
> >> >> > >>
> >> >> > >> most of the bugs/issues/etc.. that I found while upgrading from
> CDH 5
> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in
> here)
> >> >> > >> upgrade/rollback procedures for HDFS (all written in
> >> >> > >> https://phabricator.wikimedia.org/T244499, will add
> documentation
> >> >> > >> about this at the end I promise).
> >> >> > >>
> >> >> > >> I initially followed [1][2] in my Test cluster, choosing the
> Rolling
> >> >> > >> upgrade, but when I tried to rollback (after days since the
> initial
> >> >> > >> upgrade) I ended up in an inconsistent state and I wasn't able
> to
> >> >> > >> recover the previous HDFS state. I didn't save the exact error
> >> >> > >> messages but the situation was more or less the following:
> >> >> > >>
> >> >> > >> FS-Image-rollback (created at the time of the upgrade) - up to
> transaction X
> >> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000
> (number
> >> >> > >> totally made up for the example)
> >> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
> >> >> > >>
> >> >> > >> When I tried to rolling rollback, the Namenode complained about
> a hole
> >> >> > >> in the transaction log, namely at X + 1, so it refused to
> start. I
> >> >> > >> tried to force a regular rollback, but the Namenode refused
> again
> >> >> > >> saying that there was no available FS Image to roll back to. I
> checked
> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs image
> with
> >> >> > >> different naming/path in case of a rolling upgrade or a regular
> >> >> > >> upgrade. Both cases make sense, especially the first one since
> there
> >> >> > >> was indeed a hole between the last transaction of the
> >> >> > >> FS-Image-rollback and the first available transaction to reply
> on the
> >> >> > >> QJM cluster. I chose the rolling upgrade initially since it was
> >> >> > >> appealing: it promises to bring back the Namenodes to their
> previous
> >> >> > >> versions, but keeping the data modified between upgrade and
> rollback.
> >> >> > >>
> >> >> > >> I then found [3], in which it is said that with QJM everything
> is more
> >> >> > >> complicated, and a regular rollback is the only option
> available. What
> >> >> > >> I think this mean is that due to the Edit log spread among
> multiple
> >> >> > >> nodes, a rollback that keeps data between upgrade and rollback
> is not
> >> >> > >> available, so worst case scenario the data modified during that
> >> >> > >> timeframe is lost. Not a big deal in my case, but I want to
> triple
> >> >> > >> check with you if this is the correct interpretation or if
> there is
> >> >> > >> another tutorial/guide/etc.. that I haven't read with a
> different
> >> >> > >> procedure :)
> >> >> > >>
> >> >> > >> Is my interpretation correct? If not, is there anybody with
> experience
> >> >> > >> in HDFS upgrades that could shed some light on the subject?
> >> >> > >>
> >> >> > >> Thanks in advance!
> >> >> > >>
> >> >> > >> Luca
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
> >> >> > >> [2]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> >> >> > >> [3]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled
>

Re: Testing rollback after HDFS upgrade

Posted by Luca Toscano <to...@gmail.com>.
Hi Evans,

it took a while I know but I have the first version of the gdoc for the upgrade:

https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing

I tried to list all the steps involved in migrating from CDH 5 to
Bigtop 1.4, anybody interested should be able to comment. The idea
that I have is to discuss this for a few days and then possibly make
it permanent somewhere in the Bigtop wiki? (of course if the document
will be considered useful for others etc..)

During these days I tested the procedure multiple times, and I have
also tested the HDFS finalize step, everything works as expected. I
hope to be able to move to Bigtop during the next couple of months.

Luca

On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <ev...@apache.org> wrote:
>
> Yes. I think a shared gdoc is prefered, and you can open up a JIRA ticket to track it.
>
> Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:
>>
>> Hi Evans!
>>
>> What is the best medium to use for the documentation/comments ? A
>> shared gdoc or something similar?
>>
>> Luca
>>
>> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
>> >
>> > One thing I think would be great to have is a doc version of the steps for upgrade and rollback. The benefits:
>> > 1. Anything unexpected happened during automation, you do have folks can quickly understand what's going on and get into the investigation.
>> > 2. Share the doc with us to help the others OSS users for doing the migration. For the env specific things I think that's fine. We can left comment on it. At least all the other users can get a high level view of a proven solution. And then they can go and find out the rest of the pieces by themselves.
>> >
>> > For automations, I suggest to split up the automation into several stages, and apply some validation steps(manually is ok) before kicking of the next stage.
>> >
>> > Best,
>> > Evans
>> >
>> >
>> >
>> >
>> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
>> >>
>> >> Hi everybody,
>> >>
>> >> I didn't get the time to work on this until recently, but I finally
>> >> managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
>> >> and rollback if needed. The assumptions are:
>> >>
>> >> 1) It is ok to have (limited) cluster downtime.
>> >> 2) Rolling upgrade is not needed.
>> >> 3) QJM is used.
>> >>
>> >> The procedure is listed in these two scripts:
>> >>
>> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>> >>
>> >> The code is highly dependent on my working environment, but it should
>> >> be clear to follow when writing a tutorial about how to migrate from
>> >> CDH to Bigtop. All the suggestions given by this mailing list were
>> >> really useful to reach a solution!
>> >>
>> >> My next steps will be:
>> >>
>> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
>> >> jobs, test Hive 2, etc..).
>> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
>> >> (HDFS 2.6.0-cdh -> 2.8.5).
>> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
>> >> 4) Upgrade to Debian 10.
>> >>
>> >> With automation it shouldn't be very difficult, I'll report progress once made.
>> >>
>> >> Thanks a lot!
>> >>
>> >> Luca
>> >>
>> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com> wrote:
>> >> >
>> >> > Hi Evans,
>> >> >
>> >> > thanks a lot for the feedback, it was exactly what I needed. The
>> >> > simpler the better is definitely a good advice in this use case, I'll
>> >> > try this week another rollout/rollback and report back :)
>> >> >
>> >> > Luca
>> >> >
>> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
>> >> > >
>> >> > > Hi Luca,
>> >> > >
>> >> > > Thanks for reporting back and let us know how it goes.
>> >> > > I don't have the exactly HDFS with QJM HA upgrade experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM HA, which was back in 2014.
>> >> > >
>> >> > > Regarding to rollback, I think you're right:
>> >> > >
>> >> > > it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
>> >> > >
>> >> > > My previous experience is the same that the rollback is merely a snapshot before the upgrade. If you've gone far, then rollback cost more data lost... Our runbook is if our sanity check failed during upgrade downtime, we perform the rollback immediately.
>> >> > >
>> >> > > Regarding to that FSImage hole issue, I've experienced it as well.
>> >> > > I managed to fix it by manually edit the FSImage with offline image viewer[1] and delete that missing editLog in FSImage. That actually brought my cluster back with a little number of missing blocks.
>> >> > >
>> >> > > Our experience says that the more the steps, the more the chance you failed the upgrade. We did good on dozen times of testing, DEV cluster, STAGING cluster, but still got missing blocks when upgrading Production...
>> >> > >
>> >> > > The suggestion is to get your production in good shape first(the less decommissioned, offline DNs, disk failures, the better).
>> >> > > Also, maybe you can switch to non-HA mode and do the upgrade to simplify the things?
>> >> > >
>> >> > > Not many helps but please let us know if any progress.
>> >> > > Last one, have you reached out to Hadoop community? the authors should know the most :)
>> >> > >
>> >> > > - Evans
>> >> > >
>> >> > > [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
>> >> > >
>> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
>> >> > >>
>> >> > >> Hi everybody,
>> >> > >>
>> >> > >> most of the bugs/issues/etc.. that I found while upgrading from CDH 5
>> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
>> >> > >> upgrade/rollback procedures for HDFS (all written in
>> >> > >> https://phabricator.wikimedia.org/T244499, will add documentation
>> >> > >> about this at the end I promise).
>> >> > >>
>> >> > >> I initially followed [1][2] in my Test cluster, choosing the Rolling
>> >> > >> upgrade, but when I tried to rollback (after days since the initial
>> >> > >> upgrade) I ended up in an inconsistent state and I wasn't able to
>> >> > >> recover the previous HDFS state. I didn't save the exact error
>> >> > >> messages but the situation was more or less the following:
>> >> > >>
>> >> > >> FS-Image-rollback (created at the time of the upgrade) - up to transaction X
>> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
>> >> > >> totally made up for the example)
>> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
>> >> > >>
>> >> > >> When I tried to rolling rollback, the Namenode complained about a hole
>> >> > >> in the transaction log, namely at X + 1, so it refused to start. I
>> >> > >> tried to force a regular rollback, but the Namenode refused again
>> >> > >> saying that there was no available FS Image to roll back to. I checked
>> >> > >> in the Hadoop code and indeed the Namenode saves the fs image with
>> >> > >> different naming/path in case of a rolling upgrade or a regular
>> >> > >> upgrade. Both cases make sense, especially the first one since there
>> >> > >> was indeed a hole between the last transaction of the
>> >> > >> FS-Image-rollback and the first available transaction to reply on the
>> >> > >> QJM cluster. I chose the rolling upgrade initially since it was
>> >> > >> appealing: it promises to bring back the Namenodes to their previous
>> >> > >> versions, but keeping the data modified between upgrade and rollback.
>> >> > >>
>> >> > >> I then found [3], in which it is said that with QJM everything is more
>> >> > >> complicated, and a regular rollback is the only option available. What
>> >> > >> I think this mean is that due to the Edit log spread among multiple
>> >> > >> nodes, a rollback that keeps data between upgrade and rollback is not
>> >> > >> available, so worst case scenario the data modified during that
>> >> > >> timeframe is lost. Not a big deal in my case, but I want to triple
>> >> > >> check with you if this is the correct interpretation or if there is
>> >> > >> another tutorial/guide/etc.. that I haven't read with a different
>> >> > >> procedure :)
>> >> > >>
>> >> > >> Is my interpretation correct? If not, is there anybody with experience
>> >> > >> in HDFS upgrades that could shed some light on the subject?
>> >> > >>
>> >> > >> Thanks in advance!
>> >> > >>
>> >> > >> Luca
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
>> >> > >> [2] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>> >> > >> [3] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Re: Testing rollback after HDFS upgrade

Posted by Evans Ye <ev...@apache.org>.
Yes. I think a shared gdoc is prefered, and you can open up a JIRA ticket
to track it.

Luca Toscano <to...@gmail.com> 於 2020年7月20日 週一 21:10 寫道:

> Hi Evans!
>
> What is the best medium to use for the documentation/comments ? A
> shared gdoc or something similar?
>
> Luca
>
> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
> >
> > One thing I think would be great to have is a doc version of the steps
> for upgrade and rollback. The benefits:
> > 1. Anything unexpected happened during automation, you do have folks can
> quickly understand what's going on and get into the investigation.
> > 2. Share the doc with us to help the others OSS users for doing the
> migration. For the env specific things I think that's fine. We can left
> comment on it. At least all the other users can get a high level view of a
> proven solution. And then they can go and find out the rest of the pieces
> by themselves.
> >
> > For automations, I suggest to split up the automation into several
> stages, and apply some validation steps(manually is ok) before kicking of
> the next stage.
> >
> > Best,
> > Evans
> >
> >
> >
> >
> > Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
> >>
> >> Hi everybody,
> >>
> >> I didn't get the time to work on this until recently, but I finally
> >> managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
> >> and rollback if needed. The assumptions are:
> >>
> >> 1) It is ok to have (limited) cluster downtime.
> >> 2) Rolling upgrade is not needed.
> >> 3) QJM is used.
> >>
> >> The procedure is listed in these two scripts:
> >>
> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
> >>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
> >>
> >> The code is highly dependent on my working environment, but it should
> >> be clear to follow when writing a tutorial about how to migrate from
> >> CDH to Bigtop. All the suggestions given by this mailing list were
> >> really useful to reach a solution!
> >>
> >> My next steps will be:
> >>
> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
> >> jobs, test Hive 2, etc..).
> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
> >> (HDFS 2.6.0-cdh -> 2.8.5).
> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
> >> 4) Upgrade to Debian 10.
> >>
> >> With automation it shouldn't be very difficult, I'll report progress
> once made.
> >>
> >> Thanks a lot!
> >>
> >> Luca
> >>
> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com>
> wrote:
> >> >
> >> > Hi Evans,
> >> >
> >> > thanks a lot for the feedback, it was exactly what I needed. The
> >> > simpler the better is definitely a good advice in this use case, I'll
> >> > try this week another rollout/rollback and report back :)
> >> >
> >> > Luca
> >> >
> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
> >> > >
> >> > > Hi Luca,
> >> > >
> >> > > Thanks for reporting back and let us know how it goes.
> >> > > I don't have the exactly HDFS with QJM HA upgrade experience. The
> experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM
> HA, which was back in 2014.
> >> > >
> >> > > Regarding to rollback, I think you're right:
> >> > >
> >> > > it is possible to rollback to HDFS’ state before the upgrade in
> case of unexpected problems.
> >> > >
> >> > > My previous experience is the same that the rollback is merely a
> snapshot before the upgrade. If you've gone far, then rollback cost more
> data lost... Our runbook is if our sanity check failed during upgrade
> downtime, we perform the rollback immediately.
> >> > >
> >> > > Regarding to that FSImage hole issue, I've experienced it as well.
> >> > > I managed to fix it by manually edit the FSImage with offline image
> viewer[1] and delete that missing editLog in FSImage. That actually brought
> my cluster back with a little number of missing blocks.
> >> > >
> >> > > Our experience says that the more the steps, the more the chance
> you failed the upgrade. We did good on dozen times of testing, DEV cluster,
> STAGING cluster, but still got missing blocks when upgrading Production...
> >> > >
> >> > > The suggestion is to get your production in good shape first(the
> less decommissioned, offline DNs, disk failures, the better).
> >> > > Also, maybe you can switch to non-HA mode and do the upgrade to
> simplify the things?
> >> > >
> >> > > Not many helps but please let us know if any progress.
> >> > > Last one, have you reached out to Hadoop community? the authors
> should know the most :)
> >> > >
> >> > > - Evans
> >> > >
> >> > > [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
> >> > >
> >> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
> >> > >>
> >> > >> Hi everybody,
> >> > >>
> >> > >> most of the bugs/issues/etc.. that I found while upgrading from
> CDH 5
> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in
> here)
> >> > >> upgrade/rollback procedures for HDFS (all written in
> >> > >> https://phabricator.wikimedia.org/T244499, will add documentation
> >> > >> about this at the end I promise).
> >> > >>
> >> > >> I initially followed [1][2] in my Test cluster, choosing the
> Rolling
> >> > >> upgrade, but when I tried to rollback (after days since the initial
> >> > >> upgrade) I ended up in an inconsistent state and I wasn't able to
> >> > >> recover the previous HDFS state. I didn't save the exact error
> >> > >> messages but the situation was more or less the following:
> >> > >>
> >> > >> FS-Image-rollback (created at the time of the upgrade) - up to
> transaction X
> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
> >> > >> totally made up for the example)
> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
> >> > >>
> >> > >> When I tried to rolling rollback, the Namenode complained about a
> hole
> >> > >> in the transaction log, namely at X + 1, so it refused to start. I
> >> > >> tried to force a regular rollback, but the Namenode refused again
> >> > >> saying that there was no available FS Image to roll back to. I
> checked
> >> > >> in the Hadoop code and indeed the Namenode saves the fs image with
> >> > >> different naming/path in case of a rolling upgrade or a regular
> >> > >> upgrade. Both cases make sense, especially the first one since
> there
> >> > >> was indeed a hole between the last transaction of the
> >> > >> FS-Image-rollback and the first available transaction to reply on
> the
> >> > >> QJM cluster. I chose the rolling upgrade initially since it was
> >> > >> appealing: it promises to bring back the Namenodes to their
> previous
> >> > >> versions, but keeping the data modified between upgrade and
> rollback.
> >> > >>
> >> > >> I then found [3], in which it is said that with QJM everything is
> more
> >> > >> complicated, and a regular rollback is the only option available.
> What
> >> > >> I think this mean is that due to the Edit log spread among multiple
> >> > >> nodes, a rollback that keeps data between upgrade and rollback is
> not
> >> > >> available, so worst case scenario the data modified during that
> >> > >> timeframe is lost. Not a big deal in my case, but I want to triple
> >> > >> check with you if this is the correct interpretation or if there is
> >> > >> another tutorial/guide/etc.. that I haven't read with a different
> >> > >> procedure :)
> >> > >>
> >> > >> Is my interpretation correct? If not, is there anybody with
> experience
> >> > >> in HDFS upgrades that could shed some light on the subject?
> >> > >>
> >> > >> Thanks in advance!
> >> > >>
> >> > >> Luca
> >> > >>
> >> > >>
> >> > >>
> >> > >> [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
> >> > >> [2]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> >> > >> [3]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled
>

Re: Testing rollback after HDFS upgrade

Posted by Luca Toscano <to...@gmail.com>.
Hi Evans!

What is the best medium to use for the documentation/comments ? A
shared gdoc or something similar?

Luca

On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <ev...@apache.org> wrote:
>
> One thing I think would be great to have is a doc version of the steps for upgrade and rollback. The benefits:
> 1. Anything unexpected happened during automation, you do have folks can quickly understand what's going on and get into the investigation.
> 2. Share the doc with us to help the others OSS users for doing the migration. For the env specific things I think that's fine. We can left comment on it. At least all the other users can get a high level view of a proven solution. And then they can go and find out the rest of the pieces by themselves.
>
> For automations, I suggest to split up the automation into several stages, and apply some validation steps(manually is ok) before kicking of the next stage.
>
> Best,
> Evans
>
>
>
>
> Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:
>>
>> Hi everybody,
>>
>> I didn't get the time to work on this until recently, but I finally
>> managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
>> and rollback if needed. The assumptions are:
>>
>> 1) It is ok to have (limited) cluster downtime.
>> 2) Rolling upgrade is not needed.
>> 3) QJM is used.
>>
>> The procedure is listed in these two scripts:
>>
>> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>>
>> The code is highly dependent on my working environment, but it should
>> be clear to follow when writing a tutorial about how to migrate from
>> CDH to Bigtop. All the suggestions given by this mailing list were
>> really useful to reach a solution!
>>
>> My next steps will be:
>>
>> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
>> jobs, test Hive 2, etc..).
>> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
>> (HDFS 2.6.0-cdh -> 2.8.5).
>> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
>> 4) Upgrade to Debian 10.
>>
>> With automation it shouldn't be very difficult, I'll report progress once made.
>>
>> Thanks a lot!
>>
>> Luca
>>
>> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com> wrote:
>> >
>> > Hi Evans,
>> >
>> > thanks a lot for the feedback, it was exactly what I needed. The
>> > simpler the better is definitely a good advice in this use case, I'll
>> > try this week another rollout/rollback and report back :)
>> >
>> > Luca
>> >
>> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
>> > >
>> > > Hi Luca,
>> > >
>> > > Thanks for reporting back and let us know how it goes.
>> > > I don't have the exactly HDFS with QJM HA upgrade experience. The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM HA, which was back in 2014.
>> > >
>> > > Regarding to rollback, I think you're right:
>> > >
>> > > it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
>> > >
>> > > My previous experience is the same that the rollback is merely a snapshot before the upgrade. If you've gone far, then rollback cost more data lost... Our runbook is if our sanity check failed during upgrade downtime, we perform the rollback immediately.
>> > >
>> > > Regarding to that FSImage hole issue, I've experienced it as well.
>> > > I managed to fix it by manually edit the FSImage with offline image viewer[1] and delete that missing editLog in FSImage. That actually brought my cluster back with a little number of missing blocks.
>> > >
>> > > Our experience says that the more the steps, the more the chance you failed the upgrade. We did good on dozen times of testing, DEV cluster, STAGING cluster, but still got missing blocks when upgrading Production...
>> > >
>> > > The suggestion is to get your production in good shape first(the less decommissioned, offline DNs, disk failures, the better).
>> > > Also, maybe you can switch to non-HA mode and do the upgrade to simplify the things?
>> > >
>> > > Not many helps but please let us know if any progress.
>> > > Last one, have you reached out to Hadoop community? the authors should know the most :)
>> > >
>> > > - Evans
>> > >
>> > > [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
>> > >
>> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
>> > >>
>> > >> Hi everybody,
>> > >>
>> > >> most of the bugs/issues/etc.. that I found while upgrading from CDH 5
>> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
>> > >> upgrade/rollback procedures for HDFS (all written in
>> > >> https://phabricator.wikimedia.org/T244499, will add documentation
>> > >> about this at the end I promise).
>> > >>
>> > >> I initially followed [1][2] in my Test cluster, choosing the Rolling
>> > >> upgrade, but when I tried to rollback (after days since the initial
>> > >> upgrade) I ended up in an inconsistent state and I wasn't able to
>> > >> recover the previous HDFS state. I didn't save the exact error
>> > >> messages but the situation was more or less the following:
>> > >>
>> > >> FS-Image-rollback (created at the time of the upgrade) - up to transaction X
>> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
>> > >> totally made up for the example)
>> > >> QJM cluster: first available transaction Z = X + 10000 + 1
>> > >>
>> > >> When I tried to rolling rollback, the Namenode complained about a hole
>> > >> in the transaction log, namely at X + 1, so it refused to start. I
>> > >> tried to force a regular rollback, but the Namenode refused again
>> > >> saying that there was no available FS Image to roll back to. I checked
>> > >> in the Hadoop code and indeed the Namenode saves the fs image with
>> > >> different naming/path in case of a rolling upgrade or a regular
>> > >> upgrade. Both cases make sense, especially the first one since there
>> > >> was indeed a hole between the last transaction of the
>> > >> FS-Image-rollback and the first available transaction to reply on the
>> > >> QJM cluster. I chose the rolling upgrade initially since it was
>> > >> appealing: it promises to bring back the Namenodes to their previous
>> > >> versions, but keeping the data modified between upgrade and rollback.
>> > >>
>> > >> I then found [3], in which it is said that with QJM everything is more
>> > >> complicated, and a regular rollback is the only option available. What
>> > >> I think this mean is that due to the Edit log spread among multiple
>> > >> nodes, a rollback that keeps data between upgrade and rollback is not
>> > >> available, so worst case scenario the data modified during that
>> > >> timeframe is lost. Not a big deal in my case, but I want to triple
>> > >> check with you if this is the correct interpretation or if there is
>> > >> another tutorial/guide/etc.. that I haven't read with a different
>> > >> procedure :)
>> > >>
>> > >> Is my interpretation correct? If not, is there anybody with experience
>> > >> in HDFS upgrades that could shed some light on the subject?
>> > >>
>> > >> Thanks in advance!
>> > >>
>> > >> Luca
>> > >>
>> > >>
>> > >>
>> > >> [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
>> > >> [2] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>> > >> [3] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Re: Testing rollback after HDFS upgrade

Posted by Evans Ye <ev...@apache.org>.
Hey Luca,

Thanks for getting back to us. That sounds very promising.

One thing I think would be great to have is a doc version of the steps for
upgrade and rollback. The benefits:
1. Anything unexpected happened during automation, you do have folks can
quickly understand what's going on and get into the investigation.
2. Share the doc with us to help the others OSS users for doing the
migration. For the env specific things I think that's fine. We can left
comment on it. At least all the other users can get a high level view of a
proven solution. And then they can go and find out the rest of the pieces
by themselves.

For automations, I suggest to split up the automation into several stages,
and apply some validation steps(manually is ok) before kicking of the next
stage.

Best,
Evans




Luca Toscano <to...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道:

> Hi everybody,
>
> I didn't get the time to work on this until recently, but I finally
> managed to have a reliable procedure to upgrade from CDH to Bigtop 1.4
> and rollback if needed. The assumptions are:
>
> 1) It is ok to have (limited) cluster downtime.
> 2) Rolling upgrade is not needed.
> 3) QJM is used.
>
> The procedure is listed in these two scripts:
>
>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>
> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>
> The code is highly dependent on my working environment, but it should
> be clear to follow when writing a tutorial about how to migrate from
> CDH to Bigtop. All the suggestions given by this mailing list were
> really useful to reach a solution!
>
> My next steps will be:
>
> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
> jobs, test Hive 2, etc..).
> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
> (HDFS 2.6.0-cdh -> 2.8.5).
> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
> 4) Upgrade to Debian 10.
>
> With automation it shouldn't be very difficult, I'll report progress once
> made.
>
> Thanks a lot!
>
> Luca
>
> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano <to...@gmail.com>
> wrote:
> >
> > Hi Evans,
> >
> > thanks a lot for the feedback, it was exactly what I needed. The
> > simpler the better is definitely a good advice in this use case, I'll
> > try this week another rollout/rollback and report back :)
> >
> > Luca
> >
> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <ev...@apache.org> wrote:
> > >
> > > Hi Luca,
> > >
> > > Thanks for reporting back and let us know how it goes.
> > > I don't have the exactly HDFS with QJM HA upgrade experience. The
> experience I had was 0.20 non-HA upgrade to 2.0 non-HA and then enable QJM
> HA, which was back in 2014.
> > >
> > > Regarding to rollback, I think you're right:
> > >
> > > it is possible to rollback to HDFS’ state before the upgrade in case
> of unexpected problems.
> > >
> > > My previous experience is the same that the rollback is merely a
> snapshot before the upgrade. If you've gone far, then rollback cost more
> data lost... Our runbook is if our sanity check failed during upgrade
> downtime, we perform the rollback immediately.
> > >
> > > Regarding to that FSImage hole issue, I've experienced it as well.
> > > I managed to fix it by manually edit the FSImage with offline image
> viewer[1] and delete that missing editLog in FSImage. That actually brought
> my cluster back with a little number of missing blocks.
> > >
> > > Our experience says that the more the steps, the more the chance you
> failed the upgrade. We did good on dozen times of testing, DEV cluster,
> STAGING cluster, but still got missing blocks when upgrading Production...
> > >
> > > The suggestion is to get your production in good shape first(the less
> decommissioned, offline DNs, disk failures, the better).
> > > Also, maybe you can switch to non-HA mode and do the upgrade to
> simplify the things?
> > >
> > > Not many helps but please let us know if any progress.
> > > Last one, have you reached out to Hadoop community? the authors should
> know the most :)
> > >
> > > - Evans
> > >
> > > [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
> > >
> > > Luca Toscano <to...@gmail.com> 於 2020年4月8日 週三 21:03 寫道:
> > >>
> > >> Hi everybody,
> > >>
> > >> most of the bugs/issues/etc.. that I found while upgrading from CDH 5
> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
> > >> upgrade/rollback procedures for HDFS (all written in
> > >> https://phabricator.wikimedia.org/T244499, will add documentation
> > >> about this at the end I promise).
> > >>
> > >> I initially followed [1][2] in my Test cluster, choosing the Rolling
> > >> upgrade, but when I tried to rollback (after days since the initial
> > >> upgrade) I ended up in an inconsistent state and I wasn't able to
> > >> recover the previous HDFS state. I didn't save the exact error
> > >> messages but the situation was more or less the following:
> > >>
> > >> FS-Image-rollback (created at the time of the upgrade) - up to
> transaction X
> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 (number
> > >> totally made up for the example)
> > >> QJM cluster: first available transaction Z = X + 10000 + 1
> > >>
> > >> When I tried to rolling rollback, the Namenode complained about a hole
> > >> in the transaction log, namely at X + 1, so it refused to start. I
> > >> tried to force a regular rollback, but the Namenode refused again
> > >> saying that there was no available FS Image to roll back to. I checked
> > >> in the Hadoop code and indeed the Namenode saves the fs image with
> > >> different naming/path in case of a rolling upgrade or a regular
> > >> upgrade. Both cases make sense, especially the first one since there
> > >> was indeed a hole between the last transaction of the
> > >> FS-Image-rollback and the first available transaction to reply on the
> > >> QJM cluster. I chose the rolling upgrade initially since it was
> > >> appealing: it promises to bring back the Namenodes to their previous
> > >> versions, but keeping the data modified between upgrade and rollback.
> > >>
> > >> I then found [3], in which it is said that with QJM everything is more
> > >> complicated, and a regular rollback is the only option available. What
> > >> I think this mean is that due to the Edit log spread among multiple
> > >> nodes, a rollback that keeps data between upgrade and rollback is not
> > >> available, so worst case scenario the data modified during that
> > >> timeframe is lost. Not a big deal in my case, but I want to triple
> > >> check with you if this is the correct interpretation or if there is
> > >> another tutorial/guide/etc.. that I haven't read with a different
> > >> procedure :)
> > >>
> > >> Is my interpretation correct? If not, is there anybody with experience
> > >> in HDFS upgrades that could shed some light on the subject?
> > >>
> > >> Thanks in advance!
> > >>
> > >> Luca
> > >>
> > >>
> > >>
> > >> [1]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
> > >> [2]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> > >> [3]
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled
>