You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Kengo Seki <se...@apache.org> on 2020/09/14 04:44:18 UTC

PPC CI server failure

Hi everyone,

Let me share information about the CI environment.
The worker node for ppc64le is currently offlined, so I just killed all jobs
in the queue waiting for it gets back. Its status is as follows.

- According to the result of `who -b`, that machine seems to be rebooted
  on 2020-09-11 for some reason (probably unexpectedly).

- According to the result of dmesg, the root volume was mounted
  in read-only mode because of a fsck failure.

  [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR because of
unprocessed orphan inode list.  Please umount/remount instead
  [   60.714110] cgroup: new mount options do not match the existing
superblock, will be ignored
  [  316.385805] EXT4-fs (vda1): error count since last fsck: 9459
  [  316.385824] EXT4-fs (vda1): initial error at time 1540294049:
ext4_validate_inode_bitmap:134
  [  316.385826] EXT4-fs (vda1): last error at time 1596881526:
ext4_free_inode:383

It looks like some fsck work (and replacing the volume, if it fails)
are required,
but I'm not sure if I could run something like `e2fsck -p`, because
I'm also not sure
where does that machine exist or who's managing it.
(I slightly thought it was running as a VM with QEMU on some EC2
instance, but I couldn't find it)

> Cos, Evans, Olaf
Would you provide any suggestions?

Kengo Seki <se...@apache.org>

Re: PPC CI server failure

Posted by Kengo Seki <se...@apache.org>.
Hi Amir,

The problem I previously reported was resolved in BIGTOP-3537, so I've
enabled ppc64le builds on CI.
Then I came across some failures. See the following comment for details.
https://issues.apache.org/jira/browse/BIGTOP-3533?focusedCommentId=17352207&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17352207

Kengo Seki <se...@apache.org>

On Mon, May 10, 2021 at 9:36 AM Kengo Seki <se...@apache.org> wrote:
>
> > Let me know when the ppc64le CI/CD is going to get enabled to help us identify the failing components.
>
> Sure! As a first step, I added the ppc64le configuration back to the
> Docker-related jobs.
>
> https://ci.bigtop.apache.org/view/Docker/job/Docker-Puppet-Trunk/
> https://ci.bigtop.apache.org/view/Docker/job/Docker-Puppet-Trunk-pull/
> https://ci.bigtop.apache.org/view/Docker/job/Docker-Toolchain-Trunk/
> https://ci.bigtop.apache.org/view/Docker/job/Docker-Toolchain-Trunk-pull/
>
> But Docker-Puppet-Trunk failed only on CentOS 8 due to PowerTools
> repository setting for some reason.
>
> https://ci.bigtop.apache.org/view/Docker/job/Docker-Puppet-Trunk/24/
>
> I'll keep investigating and let you know when there's any progress.
>
> Kengo Seki <se...@apache.org>
>
> On Mon, May 3, 2021 at 10:36 PM MrAsanjar . <as...@apache.org> wrote:
> >
> > Let me know when the ppc64le CI/CD is going to get enabled to help us
> > identify the failing components.
> >
> > On Fri, Apr 16, 2021 at 8:00 PM Kengo Seki <se...@apache.org> wrote:
> >
> > > Sorry for my late response, I was quite busy this week...
> > > Amir, thank you for recovering the ppc64le server! I've just enabled
> > > it on Jenkins and it seems to be healthy. I'm going to work on
> > > BIGTOP-3533.
> > > Also thanks to Evans and Olaf for helping him.
> > >
> > > Kengo Seki <se...@apache.org>
> > >
> > > On Sat, Apr 17, 2021 at 3:50 AM Olaf Flebbe <of...@oflebbe.de> wrote:
> > > >
> > > > I already gave the public key to asanjar.
> > > >
> > > > Olaf
> > > >
> > > > > Am 16.04.2021 um 10:49 schrieb Evans Ye <ev...@apache.org>:
> > > > >
> > > > > Let me help. I was busy on a thing.
> > > > >
> > > > >
> > > > > MrAsanjar . <as...@apache.org> 於 2021年4月15日 週四 下午10:30寫道:
> > > > >
> > > > >> In order to set up the new Jenkins slave for ppc64le (
> > > > >> https://issues.apache.org/jira/browse/BIGTOP-3534) we need Jenkins
> > > > >> master's
> > > > >> public ssh key. Who can help me here?
> > > > >>
> > > > >> On Fri, Apr 2, 2021 at 4:00 PM MrAsanjar <af...@gmail.com> wrote:
> > > > >>
> > > > >>> I have verified the state of ppc64le VM, it is operational. Could we
> > > > >>> enable the ppc64le build before OpenStack flag the VM as ideal again.
> > > > >>>
> > > > >>> On Thu, Apr 1, 2021 at 4:08 PM MrAsanjar <af...@gmail.com> wrote:
> > > > >>>
> > > > >>>> Hi lads
> > > > >>>> I just got an email that IBM has reinstated the ppc64le VM.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Mon, Mar 29, 2021 at 12:05 PM Evans Ye <ev...@apache.org>
> > > wrote:
> > > > >>>>
> > > > >>>>> Great news and thanks, Amir!
> > > > >>>>>
> > > > >>>>> Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:
> > > > >>>>>
> > > > >>>>>> Awesome! Looking forward to its back to CI.
> > > > >>>>>> Thanks a lot for helping on this, Asanjar!
> > > > >>>>>>
> > > > >>>>>> Regards,
> > > > >>>>>>
> > > > >>>>>> Jun
> > > > >>>>>>
> > > > >>>>>> MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
> > > > >>>>>>
> > > > >>>>>>> Hi old friends :)
> > > > >>>>>>> We should have a ppc64le VM back online sometime this week. I'll
> > > > >>>>> keep you
> > > > >>>>>>> all posted.
> > > > >>>>>>>
> > > > >>>>>>> On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org>
> > > > >> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi rbkrishn,
> > > > >>>>>>>>
> > > > >>>>>>>> Would you mind to comment whether those PPC servers for Bigtop
> > > CI
> > > > >>>>> can
> > > > >>>>>> be
> > > > >>>>>>>> brought up and unlock our release process?
> > > > >>>>>>>> Thanks!
> > > > >>>>>>>>
> > > > >>>>>>>> Best,
> > > > >>>>>>>> Evans
> > > > >>>>>>>>
> > > > >>>>>>>> Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
> > > > >>>>>>>>
> > > > >>>>>>>>> Thank you for checking, Evans and Amir!
> > > > >>>>>>>>>
> > > > >>>>>>>>> Kengo Seki <se...@apache.org>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org>
> > > > >>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Thank you, Amir.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi Evans, let me check with IBM again.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <
> > > > >> evansye@apache.org
> > > > >>>>>>
> > > > >>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> Hi Amir,
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> We're planning Bigtop 1.5 release and if we don't have
> > > > >> the
> > > > >>>>> CI
> > > > >>>>>>> nodes
> > > > >>>>>>>>> for
> > > > >>>>>>>>>>>> PPC, we're not able to release 1.5 with PPC supported.
> > > > >>>>>>>>>>>> Could you help to confirm again? Thanks!
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Best,
> > > > >>>>>>>>>>>> Evans Ye
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> I have informed IBM management regarding the situation,
> > > > >>>>>> waiting
> > > > >>>>>>>>> for a
> > > > >>>>>>>>>>>>> reply.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <
> > > > >>>>> evansye@apache.org
> > > > >>>>>>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Ok. Thanks for doing this to get the ball rolling.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29
> > > > >>>>> 寫道:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Thank you for your help, Amir!
> > > > >>>>>>>>>>>>>>> It's just a heads-up, I temporarily disabled builds
> > > > >>>>> for
> > > > >>>>>> ppc
> > > > >>>>>>>> in
> > > > >>>>>>>>> the
> > > > >>>>>>>>>>>>>>> following Jenkins jobs so that they can finish.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> * Docker-Puppet-Trunk
> > > > >>>>>>>>>>>>>>> * Docker-Puppet-Trunk-pull
> > > > >>>>>>>>>>>>>>> * Docker-Toolchain-Trunk
> > > > >>>>>>>>>>>>>>> * Docker-Toolchain-Trunk-pull
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> * Bigtop-trunk-packages
> > > > >>>>>>>>>>>>>>> * Bigtop-trunk-repos
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> * Remove-All-Docker-Containers-Except-Nexus
> > > > >>>>>>>>>>>>>>> * Remove-Dangling-Docker-Images
> > > > >>>>>>>>>>>>>>> * Remove-Inactive-Containers
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
> > > > >>>>>>> evansye@apache.org
> > > > >>>>>>>>>
> > > > >>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Awesome! Nice to hear from you, buddy!
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
> > > > >>>>>> 上午3:54寫道:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Hi Evans,
> > > > >>>>>>>>>>>>>>>>> Let me see what I can do. Give me 24 hr :)
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> > > > >>>>>>>>> evansye@apache.org>
> > > > >>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Yes. I think the action is correct. However
> > > > >> [2]
> > > > >>>>>> might
> > > > >>>>>>>> be
> > > > >>>>>>>>> a
> > > > >>>>>>>>>>>>>> different
> > > > >>>>>>>>>>>>>>>>> thing
> > > > >>>>>>>>>>>>>>>>>> for PPC integration in Hadoop.
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Amir,
> > > > >>>>>>>>>>>>>>>>>> Could you confirm?
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月14日
> > > > >> 週一
> > > > >>>>>>>> 下午9:56寫道:
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Thank you for the advice, Evans!
> > > > >>>>>>>>>>>>>>>>>>> Let me confirm about "PPC machine owners".
> > > > >>>>>> According
> > > > >>>>>>>> to
> > > > >>>>>>>>>>> Amir's
> > > > >>>>>>>>>>>>>> JIRA
> > > > >>>>>>>>>>>>>>>>>>> issues [1][2] and the powered-by list in the
> > > > >>>>> OSU
> > > > >>>>>>> site
> > > > >>>>>>>>> [3],
> > > > >>>>>>>>>>>> we're
> > > > >>>>>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>>>>>>> a VM hosted by OSU OSL, right?
> > > > >>>>>>>>>>>>>>>>>>> If it's correct, I'm going to ask them for
> > > > >>>>> help
> > > > >>>>>> via
> > > > >>>>>>>>>>>>>>>>>>> powerdev-request@osuosl.org.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> [1]:
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>
> > > https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > >>>>>>>>>>>>>>>>>>> [2]:
> > > > >>>>>>>> https://issues.apache.org/jira/browse/INFRA-12014
> > > > >>>>>>>>>>>>>>>>>>> [3]:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>
> > > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> > > > >>>>>>>>>>> evansye@apache.org>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> I'd suggest to reach out to PPC machine
> > > > >>>>> owners.
> > > > >>>>>>>> Worst
> > > > >>>>>>>>> case
> > > > >>>>>>>>>>>> Is
> > > > >>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>> can
> > > > >>>>>>>>>>>>>>>>>>>> temporary  drop the PPC support to move
> > > > >> the
> > > > >>>>>>> release
> > > > >>>>>>>>>>> forward.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於
> > > > >>>>> 2020年9月14日 週一
> > > > >>>>>>>> 12:44
> > > > >>>>>>>>> 寫道:
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Let me share information about the CI
> > > > >>>>>>> environment.
> > > > >>>>>>>>>>>>>>>>>>>>> The worker node for ppc64le is currently
> > > > >>>>>>> offlined,
> > > > >>>>>>>>> so I
> > > > >>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>> killed
> > > > >>>>>>>>>>>>>>>>>>> all
> > > > >>>>>>>>>>>>>>>>>>>>> jobs
> > > > >>>>>>>>>>>>>>>>>>>>> in the queue waiting for it gets back.
> > > > >> Its
> > > > >>>>>>> status
> > > > >>>>>>>>> is as
> > > > >>>>>>>>>>>>>> follows.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> - According to the result of `who -b`,
> > > > >>>>> that
> > > > >>>>>>>> machine
> > > > >>>>>>>>>>> seems
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>> rebooted
> > > > >>>>>>>>>>>>>>>>>>>>>  on 2020-09-11 for some reason
> > > > >> (probably
> > > > >>>>>>>>> unexpectedly).
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> - According to the result of dmesg, the
> > > > >>>>> root
> > > > >>>>>>>> volume
> > > > >>>>>>>>> was
> > > > >>>>>>>>>>>>>> mounted
> > > > >>>>>>>>>>>>>>>>>>>>>  in read-only mode because of a fsck
> > > > >>>>> failure.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>  [   34.840681] EXT4-fs (vda1):
> > > > >> Couldn't
> > > > >>>>>>> remount
> > > > >>>>>>>>> RDWR
> > > > >>>>>>>>>>>>> because
> > > > >>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>> unprocessed orphan inode list.  Please
> > > > >>>>>>>>> umount/remount
> > > > >>>>>>>>>>>>> instead
> > > > >>>>>>>>>>>>>>>>>>>>>  [   60.714110] cgroup: new mount
> > > > >>>>> options do
> > > > >>>>>>> not
> > > > >>>>>>>>> match
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> existing
> > > > >>>>>>>>>>>>>>>>>>>>> superblock, will be ignored
> > > > >>>>>>>>>>>>>>>>>>>>>  [  316.385805] EXT4-fs (vda1): error
> > > > >>>>> count
> > > > >>>>>>> since
> > > > >>>>>>>>> last
> > > > >>>>>>>>>>>>> fsck:
> > > > >>>>>>>>>>>>>>> 9459
> > > > >>>>>>>>>>>>>>>>>>>>>  [  316.385824] EXT4-fs (vda1): initial
> > > > >>>>> error
> > > > >>>>>>> at
> > > > >>>>>>>>> time
> > > > >>>>>>>>>>>>>>> 1540294049:
> > > > >>>>>>>>>>>>>>>>>>>>> ext4_validate_inode_bitmap:134
> > > > >>>>>>>>>>>>>>>>>>>>>  [  316.385826] EXT4-fs (vda1): last
> > > > >>>>> error at
> > > > >>>>>>>> time
> > > > >>>>>>>>>>>>>> 1596881526:
> > > > >>>>>>>>>>>>>>>>>>>>> ext4_free_inode:383
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> It looks like some fsck work (and
> > > > >>>>> replacing
> > > > >>>>>> the
> > > > >>>>>>>>> volume,
> > > > >>>>>>>>>>> if
> > > > >>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>> fails)
> > > > >>>>>>>>>>>>>>>>>>>>> are required,
> > > > >>>>>>>>>>>>>>>>>>>>> but I'm not sure if I could run
> > > > >> something
> > > > >>>>> like
> > > > >>>>>>>>> `e2fsck
> > > > >>>>>>>>>>>> -p`,
> > > > >>>>>>>>>>>>>>> because
> > > > >>>>>>>>>>>>>>>>>>>>> I'm also not sure
> > > > >>>>>>>>>>>>>>>>>>>>> where does that machine exist or who's
> > > > >>>>>> managing
> > > > >>>>>>>> it.
> > > > >>>>>>>>>>>>>>>>>>>>> (I slightly thought it was running as a
> > > > >> VM
> > > > >>>>>> with
> > > > >>>>>>>>> QEMU on
> > > > >>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>> EC2
> > > > >>>>>>>>>>>>>>>>>>>>> instance, but I couldn't find it)
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Cos, Evans, Olaf
> > > > >>>>>>>>>>>>>>>>>>>>> Would you provide any suggestions?
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > >
> > >

Re: PPC CI server failure

Posted by Kengo Seki <se...@apache.org>.
> Let me know when the ppc64le CI/CD is going to get enabled to help us identify the failing components.

Sure! As a first step, I added the ppc64le configuration back to the
Docker-related jobs.

https://ci.bigtop.apache.org/view/Docker/job/Docker-Puppet-Trunk/
https://ci.bigtop.apache.org/view/Docker/job/Docker-Puppet-Trunk-pull/
https://ci.bigtop.apache.org/view/Docker/job/Docker-Toolchain-Trunk/
https://ci.bigtop.apache.org/view/Docker/job/Docker-Toolchain-Trunk-pull/

But Docker-Puppet-Trunk failed only on CentOS 8 due to PowerTools
repository setting for some reason.

https://ci.bigtop.apache.org/view/Docker/job/Docker-Puppet-Trunk/24/

I'll keep investigating and let you know when there's any progress.

Kengo Seki <se...@apache.org>

On Mon, May 3, 2021 at 10:36 PM MrAsanjar . <as...@apache.org> wrote:
>
> Let me know when the ppc64le CI/CD is going to get enabled to help us
> identify the failing components.
>
> On Fri, Apr 16, 2021 at 8:00 PM Kengo Seki <se...@apache.org> wrote:
>
> > Sorry for my late response, I was quite busy this week...
> > Amir, thank you for recovering the ppc64le server! I've just enabled
> > it on Jenkins and it seems to be healthy. I'm going to work on
> > BIGTOP-3533.
> > Also thanks to Evans and Olaf for helping him.
> >
> > Kengo Seki <se...@apache.org>
> >
> > On Sat, Apr 17, 2021 at 3:50 AM Olaf Flebbe <of...@oflebbe.de> wrote:
> > >
> > > I already gave the public key to asanjar.
> > >
> > > Olaf
> > >
> > > > Am 16.04.2021 um 10:49 schrieb Evans Ye <ev...@apache.org>:
> > > >
> > > > Let me help. I was busy on a thing.
> > > >
> > > >
> > > > MrAsanjar . <as...@apache.org> 於 2021年4月15日 週四 下午10:30寫道:
> > > >
> > > >> In order to set up the new Jenkins slave for ppc64le (
> > > >> https://issues.apache.org/jira/browse/BIGTOP-3534) we need Jenkins
> > > >> master's
> > > >> public ssh key. Who can help me here?
> > > >>
> > > >> On Fri, Apr 2, 2021 at 4:00 PM MrAsanjar <af...@gmail.com> wrote:
> > > >>
> > > >>> I have verified the state of ppc64le VM, it is operational. Could we
> > > >>> enable the ppc64le build before OpenStack flag the VM as ideal again.
> > > >>>
> > > >>> On Thu, Apr 1, 2021 at 4:08 PM MrAsanjar <af...@gmail.com> wrote:
> > > >>>
> > > >>>> Hi lads
> > > >>>> I just got an email that IBM has reinstated the ppc64le VM.
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Mar 29, 2021 at 12:05 PM Evans Ye <ev...@apache.org>
> > wrote:
> > > >>>>
> > > >>>>> Great news and thanks, Amir!
> > > >>>>>
> > > >>>>> Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:
> > > >>>>>
> > > >>>>>> Awesome! Looking forward to its back to CI.
> > > >>>>>> Thanks a lot for helping on this, Asanjar!
> > > >>>>>>
> > > >>>>>> Regards,
> > > >>>>>>
> > > >>>>>> Jun
> > > >>>>>>
> > > >>>>>> MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
> > > >>>>>>
> > > >>>>>>> Hi old friends :)
> > > >>>>>>> We should have a ppc64le VM back online sometime this week. I'll
> > > >>>>> keep you
> > > >>>>>>> all posted.
> > > >>>>>>>
> > > >>>>>>> On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org>
> > > >> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi rbkrishn,
> > > >>>>>>>>
> > > >>>>>>>> Would you mind to comment whether those PPC servers for Bigtop
> > CI
> > > >>>>> can
> > > >>>>>> be
> > > >>>>>>>> brought up and unlock our release process?
> > > >>>>>>>> Thanks!
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Evans
> > > >>>>>>>>
> > > >>>>>>>> Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
> > > >>>>>>>>
> > > >>>>>>>>> Thank you for checking, Evans and Amir!
> > > >>>>>>>>>
> > > >>>>>>>>> Kengo Seki <se...@apache.org>
> > > >>>>>>>>>
> > > >>>>>>>>> On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org>
> > > >>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thank you, Amir.
> > > >>>>>>>>>>
> > > >>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Evans, let me check with IBM again.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <
> > > >> evansye@apache.org
> > > >>>>>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi Amir,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> We're planning Bigtop 1.5 release and if we don't have
> > > >> the
> > > >>>>> CI
> > > >>>>>>> nodes
> > > >>>>>>>>> for
> > > >>>>>>>>>>>> PPC, we're not able to release 1.5 with PPC supported.
> > > >>>>>>>>>>>> Could you help to confirm again? Thanks!
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Best,
> > > >>>>>>>>>>>> Evans Ye
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> I have informed IBM management regarding the situation,
> > > >>>>>> waiting
> > > >>>>>>>>> for a
> > > >>>>>>>>>>>>> reply.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <
> > > >>>>> evansye@apache.org
> > > >>>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Ok. Thanks for doing this to get the ball rolling.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29
> > > >>>>> 寫道:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thank you for your help, Amir!
> > > >>>>>>>>>>>>>>> It's just a heads-up, I temporarily disabled builds
> > > >>>>> for
> > > >>>>>> ppc
> > > >>>>>>>> in
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>>>> following Jenkins jobs so that they can finish.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> * Docker-Puppet-Trunk
> > > >>>>>>>>>>>>>>> * Docker-Puppet-Trunk-pull
> > > >>>>>>>>>>>>>>> * Docker-Toolchain-Trunk
> > > >>>>>>>>>>>>>>> * Docker-Toolchain-Trunk-pull
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> * Bigtop-trunk-packages
> > > >>>>>>>>>>>>>>> * Bigtop-trunk-repos
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> * Remove-All-Docker-Containers-Except-Nexus
> > > >>>>>>>>>>>>>>> * Remove-Dangling-Docker-Images
> > > >>>>>>>>>>>>>>> * Remove-Inactive-Containers
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
> > > >>>>>>> evansye@apache.org
> > > >>>>>>>>>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Awesome! Nice to hear from you, buddy!
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
> > > >>>>>> 上午3:54寫道:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Hi Evans,
> > > >>>>>>>>>>>>>>>>> Let me see what I can do. Give me 24 hr :)
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> > > >>>>>>>>> evansye@apache.org>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Yes. I think the action is correct. However
> > > >> [2]
> > > >>>>>> might
> > > >>>>>>>> be
> > > >>>>>>>>> a
> > > >>>>>>>>>>>>>> different
> > > >>>>>>>>>>>>>>>>> thing
> > > >>>>>>>>>>>>>>>>>> for PPC integration in Hadoop.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Amir,
> > > >>>>>>>>>>>>>>>>>> Could you confirm?
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月14日
> > > >> 週一
> > > >>>>>>>> 下午9:56寫道:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Thank you for the advice, Evans!
> > > >>>>>>>>>>>>>>>>>>> Let me confirm about "PPC machine owners".
> > > >>>>>> According
> > > >>>>>>>> to
> > > >>>>>>>>>>> Amir's
> > > >>>>>>>>>>>>>> JIRA
> > > >>>>>>>>>>>>>>>>>>> issues [1][2] and the powered-by list in the
> > > >>>>> OSU
> > > >>>>>>> site
> > > >>>>>>>>> [3],
> > > >>>>>>>>>>>> we're
> > > >>>>>>>>>>>>>>> using
> > > >>>>>>>>>>>>>>>>>>> a VM hosted by OSU OSL, right?
> > > >>>>>>>>>>>>>>>>>>> If it's correct, I'm going to ask them for
> > > >>>>> help
> > > >>>>>> via
> > > >>>>>>>>>>>>>>>>>>> powerdev-request@osuosl.org.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> [1]:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>
> > https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > >>>>>>>>>>>>>>>>>>> [2]:
> > > >>>>>>>> https://issues.apache.org/jira/browse/INFRA-12014
> > > >>>>>>>>>>>>>>>>>>> [3]:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>
> > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> > > >>>>>>>>>>> evansye@apache.org>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> I'd suggest to reach out to PPC machine
> > > >>>>> owners.
> > > >>>>>>>> Worst
> > > >>>>>>>>> case
> > > >>>>>>>>>>>> Is
> > > >>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>> temporary  drop the PPC support to move
> > > >> the
> > > >>>>>>> release
> > > >>>>>>>>>>> forward.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於
> > > >>>>> 2020年9月14日 週一
> > > >>>>>>>> 12:44
> > > >>>>>>>>> 寫道:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Let me share information about the CI
> > > >>>>>>> environment.
> > > >>>>>>>>>>>>>>>>>>>>> The worker node for ppc64le is currently
> > > >>>>>>> offlined,
> > > >>>>>>>>> so I
> > > >>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>> killed
> > > >>>>>>>>>>>>>>>>>>> all
> > > >>>>>>>>>>>>>>>>>>>>> jobs
> > > >>>>>>>>>>>>>>>>>>>>> in the queue waiting for it gets back.
> > > >> Its
> > > >>>>>>> status
> > > >>>>>>>>> is as
> > > >>>>>>>>>>>>>> follows.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> - According to the result of `who -b`,
> > > >>>>> that
> > > >>>>>>>> machine
> > > >>>>>>>>>>> seems
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>> rebooted
> > > >>>>>>>>>>>>>>>>>>>>>  on 2020-09-11 for some reason
> > > >> (probably
> > > >>>>>>>>> unexpectedly).
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> - According to the result of dmesg, the
> > > >>>>> root
> > > >>>>>>>> volume
> > > >>>>>>>>> was
> > > >>>>>>>>>>>>>> mounted
> > > >>>>>>>>>>>>>>>>>>>>>  in read-only mode because of a fsck
> > > >>>>> failure.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>  [   34.840681] EXT4-fs (vda1):
> > > >> Couldn't
> > > >>>>>>> remount
> > > >>>>>>>>> RDWR
> > > >>>>>>>>>>>>> because
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>> unprocessed orphan inode list.  Please
> > > >>>>>>>>> umount/remount
> > > >>>>>>>>>>>>> instead
> > > >>>>>>>>>>>>>>>>>>>>>  [   60.714110] cgroup: new mount
> > > >>>>> options do
> > > >>>>>>> not
> > > >>>>>>>>> match
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> existing
> > > >>>>>>>>>>>>>>>>>>>>> superblock, will be ignored
> > > >>>>>>>>>>>>>>>>>>>>>  [  316.385805] EXT4-fs (vda1): error
> > > >>>>> count
> > > >>>>>>> since
> > > >>>>>>>>> last
> > > >>>>>>>>>>>>> fsck:
> > > >>>>>>>>>>>>>>> 9459
> > > >>>>>>>>>>>>>>>>>>>>>  [  316.385824] EXT4-fs (vda1): initial
> > > >>>>> error
> > > >>>>>>> at
> > > >>>>>>>>> time
> > > >>>>>>>>>>>>>>> 1540294049:
> > > >>>>>>>>>>>>>>>>>>>>> ext4_validate_inode_bitmap:134
> > > >>>>>>>>>>>>>>>>>>>>>  [  316.385826] EXT4-fs (vda1): last
> > > >>>>> error at
> > > >>>>>>>> time
> > > >>>>>>>>>>>>>> 1596881526:
> > > >>>>>>>>>>>>>>>>>>>>> ext4_free_inode:383
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> It looks like some fsck work (and
> > > >>>>> replacing
> > > >>>>>> the
> > > >>>>>>>>> volume,
> > > >>>>>>>>>>> if
> > > >>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>> fails)
> > > >>>>>>>>>>>>>>>>>>>>> are required,
> > > >>>>>>>>>>>>>>>>>>>>> but I'm not sure if I could run
> > > >> something
> > > >>>>> like
> > > >>>>>>>>> `e2fsck
> > > >>>>>>>>>>>> -p`,
> > > >>>>>>>>>>>>>>> because
> > > >>>>>>>>>>>>>>>>>>>>> I'm also not sure
> > > >>>>>>>>>>>>>>>>>>>>> where does that machine exist or who's
> > > >>>>>> managing
> > > >>>>>>>> it.
> > > >>>>>>>>>>>>>>>>>>>>> (I slightly thought it was running as a
> > > >> VM
> > > >>>>>> with
> > > >>>>>>>>> QEMU on
> > > >>>>>>>>>>>> some
> > > >>>>>>>>>>>>>> EC2
> > > >>>>>>>>>>>>>>>>>>>>> instance, but I couldn't find it)
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Cos, Evans, Olaf
> > > >>>>>>>>>>>>>>>>>>>>> Would you provide any suggestions?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >

Re: PPC CI server failure

Posted by "MrAsanjar ." <as...@apache.org>.
Let me know when the ppc64le CI/CD is going to get enabled to help us
identify the failing components.

On Fri, Apr 16, 2021 at 8:00 PM Kengo Seki <se...@apache.org> wrote:

> Sorry for my late response, I was quite busy this week...
> Amir, thank you for recovering the ppc64le server! I've just enabled
> it on Jenkins and it seems to be healthy. I'm going to work on
> BIGTOP-3533.
> Also thanks to Evans and Olaf for helping him.
>
> Kengo Seki <se...@apache.org>
>
> On Sat, Apr 17, 2021 at 3:50 AM Olaf Flebbe <of...@oflebbe.de> wrote:
> >
> > I already gave the public key to asanjar.
> >
> > Olaf
> >
> > > Am 16.04.2021 um 10:49 schrieb Evans Ye <ev...@apache.org>:
> > >
> > > Let me help. I was busy on a thing.
> > >
> > >
> > > MrAsanjar . <as...@apache.org> 於 2021年4月15日 週四 下午10:30寫道:
> > >
> > >> In order to set up the new Jenkins slave for ppc64le (
> > >> https://issues.apache.org/jira/browse/BIGTOP-3534) we need Jenkins
> > >> master's
> > >> public ssh key. Who can help me here?
> > >>
> > >> On Fri, Apr 2, 2021 at 4:00 PM MrAsanjar <af...@gmail.com> wrote:
> > >>
> > >>> I have verified the state of ppc64le VM, it is operational. Could we
> > >>> enable the ppc64le build before OpenStack flag the VM as ideal again.
> > >>>
> > >>> On Thu, Apr 1, 2021 at 4:08 PM MrAsanjar <af...@gmail.com> wrote:
> > >>>
> > >>>> Hi lads
> > >>>> I just got an email that IBM has reinstated the ppc64le VM.
> > >>>>
> > >>>>
> > >>>> On Mon, Mar 29, 2021 at 12:05 PM Evans Ye <ev...@apache.org>
> wrote:
> > >>>>
> > >>>>> Great news and thanks, Amir!
> > >>>>>
> > >>>>> Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:
> > >>>>>
> > >>>>>> Awesome! Looking forward to its back to CI.
> > >>>>>> Thanks a lot for helping on this, Asanjar!
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Jun
> > >>>>>>
> > >>>>>> MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
> > >>>>>>
> > >>>>>>> Hi old friends :)
> > >>>>>>> We should have a ppc64le VM back online sometime this week. I'll
> > >>>>> keep you
> > >>>>>>> all posted.
> > >>>>>>>
> > >>>>>>> On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org>
> > >> wrote:
> > >>>>>>>
> > >>>>>>>> Hi rbkrishn,
> > >>>>>>>>
> > >>>>>>>> Would you mind to comment whether those PPC servers for Bigtop
> CI
> > >>>>> can
> > >>>>>> be
> > >>>>>>>> brought up and unlock our release process?
> > >>>>>>>> Thanks!
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Evans
> > >>>>>>>>
> > >>>>>>>> Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
> > >>>>>>>>
> > >>>>>>>>> Thank you for checking, Evans and Amir!
> > >>>>>>>>>
> > >>>>>>>>> Kengo Seki <se...@apache.org>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org>
> > >>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Thank you, Amir.
> > >>>>>>>>>>
> > >>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi Evans, let me check with IBM again.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <
> > >> evansye@apache.org
> > >>>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi Amir,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> We're planning Bigtop 1.5 release and if we don't have
> > >> the
> > >>>>> CI
> > >>>>>>> nodes
> > >>>>>>>>> for
> > >>>>>>>>>>>> PPC, we're not able to release 1.5 with PPC supported.
> > >>>>>>>>>>>> Could you help to confirm again? Thanks!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>> Evans Ye
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> I have informed IBM management regarding the situation,
> > >>>>>> waiting
> > >>>>>>>>> for a
> > >>>>>>>>>>>>> reply.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <
> > >>>>> evansye@apache.org
> > >>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Ok. Thanks for doing this to get the ball rolling.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29
> > >>>>> 寫道:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thank you for your help, Amir!
> > >>>>>>>>>>>>>>> It's just a heads-up, I temporarily disabled builds
> > >>>>> for
> > >>>>>> ppc
> > >>>>>>>> in
> > >>>>>>>>> the
> > >>>>>>>>>>>>>>> following Jenkins jobs so that they can finish.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> * Docker-Puppet-Trunk
> > >>>>>>>>>>>>>>> * Docker-Puppet-Trunk-pull
> > >>>>>>>>>>>>>>> * Docker-Toolchain-Trunk
> > >>>>>>>>>>>>>>> * Docker-Toolchain-Trunk-pull
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> * Bigtop-trunk-packages
> > >>>>>>>>>>>>>>> * Bigtop-trunk-repos
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> * Remove-All-Docker-Containers-Except-Nexus
> > >>>>>>>>>>>>>>> * Remove-Dangling-Docker-Images
> > >>>>>>>>>>>>>>> * Remove-Inactive-Containers
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
> > >>>>>>> evansye@apache.org
> > >>>>>>>>>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Awesome! Nice to hear from you, buddy!
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
> > >>>>>> 上午3:54寫道:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Hi Evans,
> > >>>>>>>>>>>>>>>>> Let me see what I can do. Give me 24 hr :)
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> > >>>>>>>>> evansye@apache.org>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Yes. I think the action is correct. However
> > >> [2]
> > >>>>>> might
> > >>>>>>>> be
> > >>>>>>>>> a
> > >>>>>>>>>>>>>> different
> > >>>>>>>>>>>>>>>>> thing
> > >>>>>>>>>>>>>>>>>> for PPC integration in Hadoop.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Amir,
> > >>>>>>>>>>>>>>>>>> Could you confirm?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月14日
> > >> 週一
> > >>>>>>>> 下午9:56寫道:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Thank you for the advice, Evans!
> > >>>>>>>>>>>>>>>>>>> Let me confirm about "PPC machine owners".
> > >>>>>> According
> > >>>>>>>> to
> > >>>>>>>>>>> Amir's
> > >>>>>>>>>>>>>> JIRA
> > >>>>>>>>>>>>>>>>>>> issues [1][2] and the powered-by list in the
> > >>>>> OSU
> > >>>>>>> site
> > >>>>>>>>> [3],
> > >>>>>>>>>>>> we're
> > >>>>>>>>>>>>>>> using
> > >>>>>>>>>>>>>>>>>>> a VM hosted by OSU OSL, right?
> > >>>>>>>>>>>>>>>>>>> If it's correct, I'm going to ask them for
> > >>>>> help
> > >>>>>> via
> > >>>>>>>>>>>>>>>>>>> powerdev-request@osuosl.org.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> [1]:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > >>>>>>>>>>>>>>>>>>> [2]:
> > >>>>>>>> https://issues.apache.org/jira/browse/INFRA-12014
> > >>>>>>>>>>>>>>>>>>> [3]:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>
> https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> > >>>>>>>>>>> evansye@apache.org>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> I'd suggest to reach out to PPC machine
> > >>>>> owners.
> > >>>>>>>> Worst
> > >>>>>>>>> case
> > >>>>>>>>>>>> Is
> > >>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>> temporary  drop the PPC support to move
> > >> the
> > >>>>>>> release
> > >>>>>>>>>>> forward.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於
> > >>>>> 2020年9月14日 週一
> > >>>>>>>> 12:44
> > >>>>>>>>> 寫道:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Let me share information about the CI
> > >>>>>>> environment.
> > >>>>>>>>>>>>>>>>>>>>> The worker node for ppc64le is currently
> > >>>>>>> offlined,
> > >>>>>>>>> so I
> > >>>>>>>>>>>> just
> > >>>>>>>>>>>>>>> killed
> > >>>>>>>>>>>>>>>>>>> all
> > >>>>>>>>>>>>>>>>>>>>> jobs
> > >>>>>>>>>>>>>>>>>>>>> in the queue waiting for it gets back.
> > >> Its
> > >>>>>>> status
> > >>>>>>>>> is as
> > >>>>>>>>>>>>>> follows.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> - According to the result of `who -b`,
> > >>>>> that
> > >>>>>>>> machine
> > >>>>>>>>>>> seems
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>> rebooted
> > >>>>>>>>>>>>>>>>>>>>>  on 2020-09-11 for some reason
> > >> (probably
> > >>>>>>>>> unexpectedly).
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> - According to the result of dmesg, the
> > >>>>> root
> > >>>>>>>> volume
> > >>>>>>>>> was
> > >>>>>>>>>>>>>> mounted
> > >>>>>>>>>>>>>>>>>>>>>  in read-only mode because of a fsck
> > >>>>> failure.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>  [   34.840681] EXT4-fs (vda1):
> > >> Couldn't
> > >>>>>>> remount
> > >>>>>>>>> RDWR
> > >>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>> unprocessed orphan inode list.  Please
> > >>>>>>>>> umount/remount
> > >>>>>>>>>>>>> instead
> > >>>>>>>>>>>>>>>>>>>>>  [   60.714110] cgroup: new mount
> > >>>>> options do
> > >>>>>>> not
> > >>>>>>>>> match
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> existing
> > >>>>>>>>>>>>>>>>>>>>> superblock, will be ignored
> > >>>>>>>>>>>>>>>>>>>>>  [  316.385805] EXT4-fs (vda1): error
> > >>>>> count
> > >>>>>>> since
> > >>>>>>>>> last
> > >>>>>>>>>>>>> fsck:
> > >>>>>>>>>>>>>>> 9459
> > >>>>>>>>>>>>>>>>>>>>>  [  316.385824] EXT4-fs (vda1): initial
> > >>>>> error
> > >>>>>>> at
> > >>>>>>>>> time
> > >>>>>>>>>>>>>>> 1540294049:
> > >>>>>>>>>>>>>>>>>>>>> ext4_validate_inode_bitmap:134
> > >>>>>>>>>>>>>>>>>>>>>  [  316.385826] EXT4-fs (vda1): last
> > >>>>> error at
> > >>>>>>>> time
> > >>>>>>>>>>>>>> 1596881526:
> > >>>>>>>>>>>>>>>>>>>>> ext4_free_inode:383
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> It looks like some fsck work (and
> > >>>>> replacing
> > >>>>>> the
> > >>>>>>>>> volume,
> > >>>>>>>>>>> if
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>> fails)
> > >>>>>>>>>>>>>>>>>>>>> are required,
> > >>>>>>>>>>>>>>>>>>>>> but I'm not sure if I could run
> > >> something
> > >>>>> like
> > >>>>>>>>> `e2fsck
> > >>>>>>>>>>>> -p`,
> > >>>>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>>>>>>>> I'm also not sure
> > >>>>>>>>>>>>>>>>>>>>> where does that machine exist or who's
> > >>>>>> managing
> > >>>>>>>> it.
> > >>>>>>>>>>>>>>>>>>>>> (I slightly thought it was running as a
> > >> VM
> > >>>>>> with
> > >>>>>>>>> QEMU on
> > >>>>>>>>>>>> some
> > >>>>>>>>>>>>>> EC2
> > >>>>>>>>>>>>>>>>>>>>> instance, but I couldn't find it)
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Cos, Evans, Olaf
> > >>>>>>>>>>>>>>>>>>>>> Would you provide any suggestions?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> >
>

Re: PPC CI server failure

Posted by Kengo Seki <se...@apache.org>.
Sorry for my late response, I was quite busy this week...
Amir, thank you for recovering the ppc64le server! I've just enabled
it on Jenkins and it seems to be healthy. I'm going to work on
BIGTOP-3533.
Also thanks to Evans and Olaf for helping him.

Kengo Seki <se...@apache.org>

On Sat, Apr 17, 2021 at 3:50 AM Olaf Flebbe <of...@oflebbe.de> wrote:
>
> I already gave the public key to asanjar.
>
> Olaf
>
> > Am 16.04.2021 um 10:49 schrieb Evans Ye <ev...@apache.org>:
> >
> > Let me help. I was busy on a thing.
> >
> >
> > MrAsanjar . <as...@apache.org> 於 2021年4月15日 週四 下午10:30寫道:
> >
> >> In order to set up the new Jenkins slave for ppc64le (
> >> https://issues.apache.org/jira/browse/BIGTOP-3534) we need Jenkins
> >> master's
> >> public ssh key. Who can help me here?
> >>
> >> On Fri, Apr 2, 2021 at 4:00 PM MrAsanjar <af...@gmail.com> wrote:
> >>
> >>> I have verified the state of ppc64le VM, it is operational. Could we
> >>> enable the ppc64le build before OpenStack flag the VM as ideal again.
> >>>
> >>> On Thu, Apr 1, 2021 at 4:08 PM MrAsanjar <af...@gmail.com> wrote:
> >>>
> >>>> Hi lads
> >>>> I just got an email that IBM has reinstated the ppc64le VM.
> >>>>
> >>>>
> >>>> On Mon, Mar 29, 2021 at 12:05 PM Evans Ye <ev...@apache.org> wrote:
> >>>>
> >>>>> Great news and thanks, Amir!
> >>>>>
> >>>>> Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:
> >>>>>
> >>>>>> Awesome! Looking forward to its back to CI.
> >>>>>> Thanks a lot for helping on this, Asanjar!
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Jun
> >>>>>>
> >>>>>> MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
> >>>>>>
> >>>>>>> Hi old friends :)
> >>>>>>> We should have a ppc64le VM back online sometime this week. I'll
> >>>>> keep you
> >>>>>>> all posted.
> >>>>>>>
> >>>>>>> On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org>
> >> wrote:
> >>>>>>>
> >>>>>>>> Hi rbkrishn,
> >>>>>>>>
> >>>>>>>> Would you mind to comment whether those PPC servers for Bigtop CI
> >>>>> can
> >>>>>> be
> >>>>>>>> brought up and unlock our release process?
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Evans
> >>>>>>>>
> >>>>>>>> Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
> >>>>>>>>
> >>>>>>>>> Thank you for checking, Evans and Amir!
> >>>>>>>>>
> >>>>>>>>> Kengo Seki <se...@apache.org>
> >>>>>>>>>
> >>>>>>>>> On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Thank you, Amir.
> >>>>>>>>>>
> >>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Evans, let me check with IBM again.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <
> >> evansye@apache.org
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Amir,
> >>>>>>>>>>>>
> >>>>>>>>>>>> We're planning Bigtop 1.5 release and if we don't have
> >> the
> >>>>> CI
> >>>>>>> nodes
> >>>>>>>>> for
> >>>>>>>>>>>> PPC, we're not able to release 1.5 with PPC supported.
> >>>>>>>>>>>> Could you help to confirm again? Thanks!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Evans Ye
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I have informed IBM management regarding the situation,
> >>>>>> waiting
> >>>>>>>>> for a
> >>>>>>>>>>>>> reply.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <
> >>>>> evansye@apache.org
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Ok. Thanks for doing this to get the ball rolling.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29
> >>>>> 寫道:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thank you for your help, Amir!
> >>>>>>>>>>>>>>> It's just a heads-up, I temporarily disabled builds
> >>>>> for
> >>>>>> ppc
> >>>>>>>> in
> >>>>>>>>> the
> >>>>>>>>>>>>>>> following Jenkins jobs so that they can finish.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> * Docker-Puppet-Trunk
> >>>>>>>>>>>>>>> * Docker-Puppet-Trunk-pull
> >>>>>>>>>>>>>>> * Docker-Toolchain-Trunk
> >>>>>>>>>>>>>>> * Docker-Toolchain-Trunk-pull
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> * Bigtop-trunk-packages
> >>>>>>>>>>>>>>> * Bigtop-trunk-repos
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> * Remove-All-Docker-Containers-Except-Nexus
> >>>>>>>>>>>>>>> * Remove-Dangling-Docker-Images
> >>>>>>>>>>>>>>> * Remove-Inactive-Containers
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
> >>>>>>> evansye@apache.org
> >>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Awesome! Nice to hear from you, buddy!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
> >>>>>> 上午3:54寫道:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Evans,
> >>>>>>>>>>>>>>>>> Let me see what I can do. Give me 24 hr :)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> >>>>>>>>> evansye@apache.org>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Yes. I think the action is correct. However
> >> [2]
> >>>>>> might
> >>>>>>>> be
> >>>>>>>>> a
> >>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>> thing
> >>>>>>>>>>>>>>>>>> for PPC integration in Hadoop.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Amir,
> >>>>>>>>>>>>>>>>>> Could you confirm?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月14日
> >> 週一
> >>>>>>>> 下午9:56寫道:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thank you for the advice, Evans!
> >>>>>>>>>>>>>>>>>>> Let me confirm about "PPC machine owners".
> >>>>>> According
> >>>>>>>> to
> >>>>>>>>>>> Amir's
> >>>>>>>>>>>>>> JIRA
> >>>>>>>>>>>>>>>>>>> issues [1][2] and the powered-by list in the
> >>>>> OSU
> >>>>>>> site
> >>>>>>>>> [3],
> >>>>>>>>>>>> we're
> >>>>>>>>>>>>>>> using
> >>>>>>>>>>>>>>>>>>> a VM hosted by OSU OSL, right?
> >>>>>>>>>>>>>>>>>>> If it's correct, I'm going to ask them for
> >>>>> help
> >>>>>> via
> >>>>>>>>>>>>>>>>>>> powerdev-request@osuosl.org.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [1]:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> >>>>>>>>>>>>>>>>>>> [2]:
> >>>>>>>> https://issues.apache.org/jira/browse/INFRA-12014
> >>>>>>>>>>>>>>>>>>> [3]:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>
> >>>>> https://osuosl.org/services/powerdev/current-projects/#foss-projects
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> >>>>>>>>>>> evansye@apache.org>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I'd suggest to reach out to PPC machine
> >>>>> owners.
> >>>>>>>> Worst
> >>>>>>>>> case
> >>>>>>>>>>>> Is
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>> temporary  drop the PPC support to move
> >> the
> >>>>>>> release
> >>>>>>>>>>> forward.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於
> >>>>> 2020年9月14日 週一
> >>>>>>>> 12:44
> >>>>>>>>> 寫道:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Let me share information about the CI
> >>>>>>> environment.
> >>>>>>>>>>>>>>>>>>>>> The worker node for ppc64le is currently
> >>>>>>> offlined,
> >>>>>>>>> so I
> >>>>>>>>>>>> just
> >>>>>>>>>>>>>>> killed
> >>>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>> jobs
> >>>>>>>>>>>>>>>>>>>>> in the queue waiting for it gets back.
> >> Its
> >>>>>>> status
> >>>>>>>>> is as
> >>>>>>>>>>>>>> follows.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> - According to the result of `who -b`,
> >>>>> that
> >>>>>>>> machine
> >>>>>>>>>>> seems
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>> rebooted
> >>>>>>>>>>>>>>>>>>>>>  on 2020-09-11 for some reason
> >> (probably
> >>>>>>>>> unexpectedly).
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> - According to the result of dmesg, the
> >>>>> root
> >>>>>>>> volume
> >>>>>>>>> was
> >>>>>>>>>>>>>> mounted
> >>>>>>>>>>>>>>>>>>>>>  in read-only mode because of a fsck
> >>>>> failure.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>  [   34.840681] EXT4-fs (vda1):
> >> Couldn't
> >>>>>>> remount
> >>>>>>>>> RDWR
> >>>>>>>>>>>>> because
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> unprocessed orphan inode list.  Please
> >>>>>>>>> umount/remount
> >>>>>>>>>>>>> instead
> >>>>>>>>>>>>>>>>>>>>>  [   60.714110] cgroup: new mount
> >>>>> options do
> >>>>>>> not
> >>>>>>>>> match
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> existing
> >>>>>>>>>>>>>>>>>>>>> superblock, will be ignored
> >>>>>>>>>>>>>>>>>>>>>  [  316.385805] EXT4-fs (vda1): error
> >>>>> count
> >>>>>>> since
> >>>>>>>>> last
> >>>>>>>>>>>>> fsck:
> >>>>>>>>>>>>>>> 9459
> >>>>>>>>>>>>>>>>>>>>>  [  316.385824] EXT4-fs (vda1): initial
> >>>>> error
> >>>>>>> at
> >>>>>>>>> time
> >>>>>>>>>>>>>>> 1540294049:
> >>>>>>>>>>>>>>>>>>>>> ext4_validate_inode_bitmap:134
> >>>>>>>>>>>>>>>>>>>>>  [  316.385826] EXT4-fs (vda1): last
> >>>>> error at
> >>>>>>>> time
> >>>>>>>>>>>>>> 1596881526:
> >>>>>>>>>>>>>>>>>>>>> ext4_free_inode:383
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> It looks like some fsck work (and
> >>>>> replacing
> >>>>>> the
> >>>>>>>>> volume,
> >>>>>>>>>>> if
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>> fails)
> >>>>>>>>>>>>>>>>>>>>> are required,
> >>>>>>>>>>>>>>>>>>>>> but I'm not sure if I could run
> >> something
> >>>>> like
> >>>>>>>>> `e2fsck
> >>>>>>>>>>>> -p`,
> >>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>> I'm also not sure
> >>>>>>>>>>>>>>>>>>>>> where does that machine exist or who's
> >>>>>> managing
> >>>>>>>> it.
> >>>>>>>>>>>>>>>>>>>>> (I slightly thought it was running as a
> >> VM
> >>>>>> with
> >>>>>>>>> QEMU on
> >>>>>>>>>>>> some
> >>>>>>>>>>>>>> EC2
> >>>>>>>>>>>>>>>>>>>>> instance, but I couldn't find it)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Cos, Evans, Olaf
> >>>>>>>>>>>>>>>>>>>>> Would you provide any suggestions?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
>

Re: PPC CI server failure

Posted by Olaf Flebbe <of...@oflebbe.de>.
I already gave the public key to asanjar. 

Olaf

> Am 16.04.2021 um 10:49 schrieb Evans Ye <ev...@apache.org>:
> 
> Let me help. I was busy on a thing.
> 
> 
> MrAsanjar . <as...@apache.org> 於 2021年4月15日 週四 下午10:30寫道:
> 
>> In order to set up the new Jenkins slave for ppc64le (
>> https://issues.apache.org/jira/browse/BIGTOP-3534) we need Jenkins
>> master's
>> public ssh key. Who can help me here?
>> 
>> On Fri, Apr 2, 2021 at 4:00 PM MrAsanjar <af...@gmail.com> wrote:
>> 
>>> I have verified the state of ppc64le VM, it is operational. Could we
>>> enable the ppc64le build before OpenStack flag the VM as ideal again.
>>> 
>>> On Thu, Apr 1, 2021 at 4:08 PM MrAsanjar <af...@gmail.com> wrote:
>>> 
>>>> Hi lads
>>>> I just got an email that IBM has reinstated the ppc64le VM.
>>>> 
>>>> 
>>>> On Mon, Mar 29, 2021 at 12:05 PM Evans Ye <ev...@apache.org> wrote:
>>>> 
>>>>> Great news and thanks, Amir!
>>>>> 
>>>>> Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:
>>>>> 
>>>>>> Awesome! Looking forward to its back to CI.
>>>>>> Thanks a lot for helping on this, Asanjar!
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> Jun
>>>>>> 
>>>>>> MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
>>>>>> 
>>>>>>> Hi old friends :)
>>>>>>> We should have a ppc64le VM back online sometime this week. I'll
>>>>> keep you
>>>>>>> all posted.
>>>>>>> 
>>>>>>> On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org>
>> wrote:
>>>>>>> 
>>>>>>>> Hi rbkrishn,
>>>>>>>> 
>>>>>>>> Would you mind to comment whether those PPC servers for Bigtop CI
>>>>> can
>>>>>> be
>>>>>>>> brought up and unlock our release process?
>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Evans
>>>>>>>> 
>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
>>>>>>>> 
>>>>>>>>> Thank you for checking, Evans and Amir!
>>>>>>>>> 
>>>>>>>>> Kengo Seki <se...@apache.org>
>>>>>>>>> 
>>>>>>>>> On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thank you, Amir.
>>>>>>>>>> 
>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
>>>>>>>>>> 
>>>>>>>>>>> Hi Evans, let me check with IBM again.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <
>> evansye@apache.org
>>>>>> 
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Amir,
>>>>>>>>>>>> 
>>>>>>>>>>>> We're planning Bigtop 1.5 release and if we don't have
>> the
>>>>> CI
>>>>>>> nodes
>>>>>>>>> for
>>>>>>>>>>>> PPC, we're not able to release 1.5 with PPC supported.
>>>>>>>>>>>> Could you help to confirm again? Thanks!
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Evans Ye
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I have informed IBM management regarding the situation,
>>>>>> waiting
>>>>>>>>> for a
>>>>>>>>>>>>> reply.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <
>>>>> evansye@apache.org
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Ok. Thanks for doing this to get the ball rolling.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29
>>>>> 寫道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank you for your help, Amir!
>>>>>>>>>>>>>>> It's just a heads-up, I temporarily disabled builds
>>>>> for
>>>>>> ppc
>>>>>>>> in
>>>>>>>>> the
>>>>>>>>>>>>>>> following Jenkins jobs so that they can finish.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> * Docker-Puppet-Trunk
>>>>>>>>>>>>>>> * Docker-Puppet-Trunk-pull
>>>>>>>>>>>>>>> * Docker-Toolchain-Trunk
>>>>>>>>>>>>>>> * Docker-Toolchain-Trunk-pull
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> * Bigtop-trunk-packages
>>>>>>>>>>>>>>> * Bigtop-trunk-repos
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> * Remove-All-Docker-Containers-Except-Nexus
>>>>>>>>>>>>>>> * Remove-Dangling-Docker-Images
>>>>>>>>>>>>>>> * Remove-Inactive-Containers
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
>>>>>>> evansye@apache.org
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Awesome! Nice to hear from you, buddy!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
>>>>>> 上午3:54寫道:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Evans,
>>>>>>>>>>>>>>>>> Let me see what I can do. Give me 24 hr :)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
>>>>>>>>> evansye@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Yes. I think the action is correct. However
>> [2]
>>>>>> might
>>>>>>>> be
>>>>>>>>> a
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>> thing
>>>>>>>>>>>>>>>>>> for PPC integration in Hadoop.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Amir,
>>>>>>>>>>>>>>>>>> Could you confirm?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於 2020年9月14日
>> 週一
>>>>>>>> 下午9:56寫道:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thank you for the advice, Evans!
>>>>>>>>>>>>>>>>>>> Let me confirm about "PPC machine owners".
>>>>>> According
>>>>>>>> to
>>>>>>>>>>> Amir's
>>>>>>>>>>>>>> JIRA
>>>>>>>>>>>>>>>>>>> issues [1][2] and the powered-by list in the
>>>>> OSU
>>>>>>> site
>>>>>>>>> [3],
>>>>>>>>>>>> we're
>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>>> a VM hosted by OSU OSL, right?
>>>>>>>>>>>>>>>>>>> If it's correct, I'm going to ask them for
>>>>> help
>>>>>> via
>>>>>>>>>>>>>>>>>>> powerdev-request@osuosl.org.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
>>>>>>>>>>>>>>>>>>> [2]:
>>>>>>>> https://issues.apache.org/jira/browse/INFRA-12014
>>>>>>>>>>>>>>>>>>> [3]:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>> https://osuosl.org/services/powerdev/current-projects/#foss-projects
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
>>>>>>>>>>> evansye@apache.org>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'd suggest to reach out to PPC machine
>>>>> owners.
>>>>>>>> Worst
>>>>>>>>> case
>>>>>>>>>>>> Is
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>> temporary  drop the PPC support to move
>> the
>>>>>>> release
>>>>>>>>>>> forward.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org> 於
>>>>> 2020年9月14日 週一
>>>>>>>> 12:44
>>>>>>>>> 寫道:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Let me share information about the CI
>>>>>>> environment.
>>>>>>>>>>>>>>>>>>>>> The worker node for ppc64le is currently
>>>>>>> offlined,
>>>>>>>>> so I
>>>>>>>>>>>> just
>>>>>>>>>>>>>>> killed
>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>> jobs
>>>>>>>>>>>>>>>>>>>>> in the queue waiting for it gets back.
>> Its
>>>>>>> status
>>>>>>>>> is as
>>>>>>>>>>>>>> follows.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> - According to the result of `who -b`,
>>>>> that
>>>>>>>> machine
>>>>>>>>>>> seems
>>>>>>>>>>>> to
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> rebooted
>>>>>>>>>>>>>>>>>>>>>  on 2020-09-11 for some reason
>> (probably
>>>>>>>>> unexpectedly).
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> - According to the result of dmesg, the
>>>>> root
>>>>>>>> volume
>>>>>>>>> was
>>>>>>>>>>>>>> mounted
>>>>>>>>>>>>>>>>>>>>>  in read-only mode because of a fsck
>>>>> failure.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>  [   34.840681] EXT4-fs (vda1):
>> Couldn't
>>>>>>> remount
>>>>>>>>> RDWR
>>>>>>>>>>>>> because
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> unprocessed orphan inode list.  Please
>>>>>>>>> umount/remount
>>>>>>>>>>>>> instead
>>>>>>>>>>>>>>>>>>>>>  [   60.714110] cgroup: new mount
>>>>> options do
>>>>>>> not
>>>>>>>>> match
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>>>>> superblock, will be ignored
>>>>>>>>>>>>>>>>>>>>>  [  316.385805] EXT4-fs (vda1): error
>>>>> count
>>>>>>> since
>>>>>>>>> last
>>>>>>>>>>>>> fsck:
>>>>>>>>>>>>>>> 9459
>>>>>>>>>>>>>>>>>>>>>  [  316.385824] EXT4-fs (vda1): initial
>>>>> error
>>>>>>> at
>>>>>>>>> time
>>>>>>>>>>>>>>> 1540294049:
>>>>>>>>>>>>>>>>>>>>> ext4_validate_inode_bitmap:134
>>>>>>>>>>>>>>>>>>>>>  [  316.385826] EXT4-fs (vda1): last
>>>>> error at
>>>>>>>> time
>>>>>>>>>>>>>> 1596881526:
>>>>>>>>>>>>>>>>>>>>> ext4_free_inode:383
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> It looks like some fsck work (and
>>>>> replacing
>>>>>> the
>>>>>>>>> volume,
>>>>>>>>>>> if
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> fails)
>>>>>>>>>>>>>>>>>>>>> are required,
>>>>>>>>>>>>>>>>>>>>> but I'm not sure if I could run
>> something
>>>>> like
>>>>>>>>> `e2fsck
>>>>>>>>>>>> -p`,
>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>> I'm also not sure
>>>>>>>>>>>>>>>>>>>>> where does that machine exist or who's
>>>>>> managing
>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>> (I slightly thought it was running as a
>> VM
>>>>>> with
>>>>>>>>> QEMU on
>>>>>>>>>>>> some
>>>>>>>>>>>>>> EC2
>>>>>>>>>>>>>>>>>>>>> instance, but I couldn't find it)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Cos, Evans, Olaf
>>>>>>>>>>>>>>>>>>>>> Would you provide any suggestions?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Kengo Seki <se...@apache.org>
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 


Re: PPC CI server failure

Posted by Evans Ye <ev...@apache.org>.
Let me help. I was busy on a thing.


MrAsanjar . <as...@apache.org> 於 2021年4月15日 週四 下午10:30寫道:

> In order to set up the new Jenkins slave for ppc64le (
> https://issues.apache.org/jira/browse/BIGTOP-3534) we need Jenkins
> master's
> public ssh key. Who can help me here?
>
> On Fri, Apr 2, 2021 at 4:00 PM MrAsanjar <af...@gmail.com> wrote:
>
> > I have verified the state of ppc64le VM, it is operational. Could we
> > enable the ppc64le build before OpenStack flag the VM as ideal again.
> >
> > On Thu, Apr 1, 2021 at 4:08 PM MrAsanjar <af...@gmail.com> wrote:
> >
> >> Hi lads
> >> I just got an email that IBM has reinstated the ppc64le VM.
> >>
> >>
> >> On Mon, Mar 29, 2021 at 12:05 PM Evans Ye <ev...@apache.org> wrote:
> >>
> >>> Great news and thanks, Amir!
> >>>
> >>> Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:
> >>>
> >>> > Awesome! Looking forward to its back to CI.
> >>> > Thanks a lot for helping on this, Asanjar!
> >>> >
> >>> > Regards,
> >>> >
> >>> > Jun
> >>> >
> >>> > MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
> >>> >
> >>> > > Hi old friends :)
> >>> > > We should have a ppc64le VM back online sometime this week. I'll
> >>> keep you
> >>> > > all posted.
> >>> > >
> >>> > > On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org>
> wrote:
> >>> > >
> >>> > > > Hi rbkrishn,
> >>> > > >
> >>> > > > Would you mind to comment whether those PPC servers for Bigtop CI
> >>> can
> >>> > be
> >>> > > > brought up and unlock our release process?
> >>> > > > Thanks!
> >>> > > >
> >>> > > > Best,
> >>> > > > Evans
> >>> > > >
> >>> > > > Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
> >>> > > >
> >>> > > > > Thank you for checking, Evans and Amir!
> >>> > > > >
> >>> > > > > Kengo Seki <se...@apache.org>
> >>> > > > >
> >>> > > > > On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org>
> >>> wrote:
> >>> > > > > >
> >>> > > > > > Thank you, Amir.
> >>> > > > > >
> >>> > > > > > MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> >>> > > > > >
> >>> > > > > > > Hi Evans, let me check with IBM again.
> >>> > > > > > >
> >>> > > > > > >
> >>> > > > > > > On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <
> evansye@apache.org
> >>> >
> >>> > > wrote:
> >>> > > > > > >
> >>> > > > > > > > Hi Amir,
> >>> > > > > > > >
> >>> > > > > > > > We're planning Bigtop 1.5 release and if we don't have
> the
> >>> CI
> >>> > > nodes
> >>> > > > > for
> >>> > > > > > > > PPC, we're not able to release 1.5 with PPC supported.
> >>> > > > > > > > Could you help to confirm again? Thanks!
> >>> > > > > > > >
> >>> > > > > > > > Best,
> >>> > > > > > > > Evans Ye
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> >>> > > > > > > >
> >>> > > > > > > > > I have informed IBM management regarding the situation,
> >>> > waiting
> >>> > > > > for a
> >>> > > > > > > > > reply.
> >>> > > > > > > > >
> >>> > > > > > > > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <
> >>> evansye@apache.org
> >>> > >
> >>> > > > > wrote:
> >>> > > > > > > > >
> >>> > > > > > > > > > Ok. Thanks for doing this to get the ball rolling.
> >>> > > > > > > > > >
> >>> > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29
> >>> 寫道:
> >>> > > > > > > > > >
> >>> > > > > > > > > > > Thank you for your help, Amir!
> >>> > > > > > > > > > > It's just a heads-up, I temporarily disabled builds
> >>> for
> >>> > ppc
> >>> > > > in
> >>> > > > > the
> >>> > > > > > > > > > > following Jenkins jobs so that they can finish.
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > * Docker-Puppet-Trunk
> >>> > > > > > > > > > > * Docker-Puppet-Trunk-pull
> >>> > > > > > > > > > > * Docker-Toolchain-Trunk
> >>> > > > > > > > > > > * Docker-Toolchain-Trunk-pull
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > * Bigtop-trunk-packages
> >>> > > > > > > > > > > * Bigtop-trunk-repos
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > * Remove-All-Docker-Containers-Except-Nexus
> >>> > > > > > > > > > > * Remove-Dangling-Docker-Images
> >>> > > > > > > > > > > * Remove-Inactive-Containers
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > Kengo Seki <se...@apache.org>
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
> >>> > > evansye@apache.org
> >>> > > > >
> >>> > > > > > > wrote:
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > > Awesome! Nice to hear from you, buddy!
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
> >>> > 上午3:54寫道:
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > > > Hi Evans,
> >>> > > > > > > > > > > > > Let me see what I can do. Give me 24 hr :)
> >>> > > > > > > > > > > > >
> >>> > > > > > > > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> >>> > > > > evansye@apache.org>
> >>> > > > > > > > > > wrote:
> >>> > > > > > > > > > > > >
> >>> > > > > > > > > > > > > > Yes. I think the action is correct. However
> [2]
> >>> > might
> >>> > > > be
> >>> > > > > a
> >>> > > > > > > > > > different
> >>> > > > > > > > > > > > > thing
> >>> > > > > > > > > > > > > > for PPC integration in Hadoop.
> >>> > > > > > > > > > > > > >
> >>> > > > > > > > > > > > > > Amir,
> >>> > > > > > > > > > > > > > Could you confirm?
> >>> > > > > > > > > > > > > >
> >>> > > > > > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日
> 週一
> >>> > > > 下午9:56寫道:
> >>> > > > > > > > > > > > > >
> >>> > > > > > > > > > > > > >> Thank you for the advice, Evans!
> >>> > > > > > > > > > > > > >> Let me confirm about "PPC machine owners".
> >>> > According
> >>> > > > to
> >>> > > > > > > Amir's
> >>> > > > > > > > > > JIRA
> >>> > > > > > > > > > > > > >> issues [1][2] and the powered-by list in the
> >>> OSU
> >>> > > site
> >>> > > > > [3],
> >>> > > > > > > > we're
> >>> > > > > > > > > > > using
> >>> > > > > > > > > > > > > >> a VM hosted by OSU OSL, right?
> >>> > > > > > > > > > > > > >> If it's correct, I'm going to ask them for
> >>> help
> >>> > via
> >>> > > > > > > > > > > > > >> powerdev-request@osuosl.org.
> >>> > > > > > > > > > > > > >>
> >>> > > > > > > > > > > > > >> [1]:
> >>> > > > > > > > > > > > > >>
> >>> > > > > > > > > > > > >
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> >>> > > > > > > > > > > > > >> [2]:
> >>> > > > https://issues.apache.org/jira/browse/INFRA-12014
> >>> > > > > > > > > > > > > >> [3]:
> >>> > > > > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > >
> >>> https://osuosl.org/services/powerdev/current-projects/#foss-projects
> >>> > > > > > > > > > > > > >>
> >>> > > > > > > > > > > > > >> Kengo Seki <se...@apache.org>
> >>> > > > > > > > > > > > > >>
> >>> > > > > > > > > > > > > >>
> >>> > > > > > > > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> >>> > > > > > > evansye@apache.org>
> >>> > > > > > > > > > > wrote:
> >>> > > > > > > > > > > > > >> >
> >>> > > > > > > > > > > > > >> > I'd suggest to reach out to PPC machine
> >>> owners.
> >>> > > > Worst
> >>> > > > > case
> >>> > > > > > > > Is
> >>> > > > > > > > > we
> >>> > > > > > > > > > > can
> >>> > > > > > > > > > > > > >> > temporary  drop the PPC support to move
> the
> >>> > > release
> >>> > > > > > > forward.
> >>> > > > > > > > > > > > > >> >
> >>> > > > > > > > > > > > > >> > Kengo Seki <se...@apache.org> 於
> >>> 2020年9月14日 週一
> >>> > > > 12:44
> >>> > > > > 寫道:
> >>> > > > > > > > > > > > > >> >
> >>> > > > > > > > > > > > > >> > > Hi everyone,
> >>> > > > > > > > > > > > > >> > >
> >>> > > > > > > > > > > > > >> > > Let me share information about the CI
> >>> > > environment.
> >>> > > > > > > > > > > > > >> > > The worker node for ppc64le is currently
> >>> > > offlined,
> >>> > > > > so I
> >>> > > > > > > > just
> >>> > > > > > > > > > > killed
> >>> > > > > > > > > > > > > >> all
> >>> > > > > > > > > > > > > >> > > jobs
> >>> > > > > > > > > > > > > >> > > in the queue waiting for it gets back.
> Its
> >>> > > status
> >>> > > > > is as
> >>> > > > > > > > > > follows.
> >>> > > > > > > > > > > > > >> > >
> >>> > > > > > > > > > > > > >> > > - According to the result of `who -b`,
> >>> that
> >>> > > > machine
> >>> > > > > > > seems
> >>> > > > > > > > to
> >>> > > > > > > > > > be
> >>> > > > > > > > > > > > > >> rebooted
> >>> > > > > > > > > > > > > >> > >   on 2020-09-11 for some reason
> (probably
> >>> > > > > unexpectedly).
> >>> > > > > > > > > > > > > >> > >
> >>> > > > > > > > > > > > > >> > > - According to the result of dmesg, the
> >>> root
> >>> > > > volume
> >>> > > > > was
> >>> > > > > > > > > > mounted
> >>> > > > > > > > > > > > > >> > >   in read-only mode because of a fsck
> >>> failure.
> >>> > > > > > > > > > > > > >> > >
> >>> > > > > > > > > > > > > >> > >   [   34.840681] EXT4-fs (vda1):
> Couldn't
> >>> > > remount
> >>> > > > > RDWR
> >>> > > > > > > > > because
> >>> > > > > > > > > > > of
> >>> > > > > > > > > > > > > >> > > unprocessed orphan inode list.  Please
> >>> > > > > umount/remount
> >>> > > > > > > > > instead
> >>> > > > > > > > > > > > > >> > >   [   60.714110] cgroup: new mount
> >>> options do
> >>> > > not
> >>> > > > > match
> >>> > > > > > > > the
> >>> > > > > > > > > > > existing
> >>> > > > > > > > > > > > > >> > > superblock, will be ignored
> >>> > > > > > > > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error
> >>> count
> >>> > > since
> >>> > > > > last
> >>> > > > > > > > > fsck:
> >>> > > > > > > > > > > 9459
> >>> > > > > > > > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial
> >>> error
> >>> > > at
> >>> > > > > time
> >>> > > > > > > > > > > 1540294049:
> >>> > > > > > > > > > > > > >> > > ext4_validate_inode_bitmap:134
> >>> > > > > > > > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last
> >>> error at
> >>> > > > time
> >>> > > > > > > > > > 1596881526:
> >>> > > > > > > > > > > > > >> > > ext4_free_inode:383
> >>> > > > > > > > > > > > > >> > >
> >>> > > > > > > > > > > > > >> > > It looks like some fsck work (and
> >>> replacing
> >>> > the
> >>> > > > > volume,
> >>> > > > > > > if
> >>> > > > > > > > > it
> >>> > > > > > > > > > > fails)
> >>> > > > > > > > > > > > > >> > > are required,
> >>> > > > > > > > > > > > > >> > > but I'm not sure if I could run
> something
> >>> like
> >>> > > > > `e2fsck
> >>> > > > > > > > -p`,
> >>> > > > > > > > > > > because
> >>> > > > > > > > > > > > > >> > > I'm also not sure
> >>> > > > > > > > > > > > > >> > > where does that machine exist or who's
> >>> > managing
> >>> > > > it.
> >>> > > > > > > > > > > > > >> > > (I slightly thought it was running as a
> VM
> >>> > with
> >>> > > > > QEMU on
> >>> > > > > > > > some
> >>> > > > > > > > > > EC2
> >>> > > > > > > > > > > > > >> > > instance, but I couldn't find it)
> >>> > > > > > > > > > > > > >> > >
> >>> > > > > > > > > > > > > >> > > > Cos, Evans, Olaf
> >>> > > > > > > > > > > > > >> > > Would you provide any suggestions?
> >>> > > > > > > > > > > > > >> > >
> >>> > > > > > > > > > > > > >> > > Kengo Seki <se...@apache.org>
> >>> > > > > > > > > > > > > >> > >
> >>> > > > > > > > > > > > > >>
> >>> > > > > > > > > > > > > >
> >>> > > > > > > > > > > > >
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
>

Re: PPC CI server failure

Posted by "MrAsanjar ." <as...@apache.org>.
In order to set up the new Jenkins slave for ppc64le (
https://issues.apache.org/jira/browse/BIGTOP-3534) we need Jenkins master's
public ssh key. Who can help me here?

On Fri, Apr 2, 2021 at 4:00 PM MrAsanjar <af...@gmail.com> wrote:

> I have verified the state of ppc64le VM, it is operational. Could we
> enable the ppc64le build before OpenStack flag the VM as ideal again.
>
> On Thu, Apr 1, 2021 at 4:08 PM MrAsanjar <af...@gmail.com> wrote:
>
>> Hi lads
>> I just got an email that IBM has reinstated the ppc64le VM.
>>
>>
>> On Mon, Mar 29, 2021 at 12:05 PM Evans Ye <ev...@apache.org> wrote:
>>
>>> Great news and thanks, Amir!
>>>
>>> Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:
>>>
>>> > Awesome! Looking forward to its back to CI.
>>> > Thanks a lot for helping on this, Asanjar!
>>> >
>>> > Regards,
>>> >
>>> > Jun
>>> >
>>> > MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
>>> >
>>> > > Hi old friends :)
>>> > > We should have a ppc64le VM back online sometime this week. I'll
>>> keep you
>>> > > all posted.
>>> > >
>>> > > On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org> wrote:
>>> > >
>>> > > > Hi rbkrishn,
>>> > > >
>>> > > > Would you mind to comment whether those PPC servers for Bigtop CI
>>> can
>>> > be
>>> > > > brought up and unlock our release process?
>>> > > > Thanks!
>>> > > >
>>> > > > Best,
>>> > > > Evans
>>> > > >
>>> > > > Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
>>> > > >
>>> > > > > Thank you for checking, Evans and Amir!
>>> > > > >
>>> > > > > Kengo Seki <se...@apache.org>
>>> > > > >
>>> > > > > On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org>
>>> wrote:
>>> > > > > >
>>> > > > > > Thank you, Amir.
>>> > > > > >
>>> > > > > > MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
>>> > > > > >
>>> > > > > > > Hi Evans, let me check with IBM again.
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <evansye@apache.org
>>> >
>>> > > wrote:
>>> > > > > > >
>>> > > > > > > > Hi Amir,
>>> > > > > > > >
>>> > > > > > > > We're planning Bigtop 1.5 release and if we don't have the
>>> CI
>>> > > nodes
>>> > > > > for
>>> > > > > > > > PPC, we're not able to release 1.5 with PPC supported.
>>> > > > > > > > Could you help to confirm again? Thanks!
>>> > > > > > > >
>>> > > > > > > > Best,
>>> > > > > > > > Evans Ye
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
>>> > > > > > > >
>>> > > > > > > > > I have informed IBM management regarding the situation,
>>> > waiting
>>> > > > > for a
>>> > > > > > > > > reply.
>>> > > > > > > > >
>>> > > > > > > > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <
>>> evansye@apache.org
>>> > >
>>> > > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > > > Ok. Thanks for doing this to get the ball rolling.
>>> > > > > > > > > >
>>> > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29
>>> 寫道:
>>> > > > > > > > > >
>>> > > > > > > > > > > Thank you for your help, Amir!
>>> > > > > > > > > > > It's just a heads-up, I temporarily disabled builds
>>> for
>>> > ppc
>>> > > > in
>>> > > > > the
>>> > > > > > > > > > > following Jenkins jobs so that they can finish.
>>> > > > > > > > > > >
>>> > > > > > > > > > > * Docker-Puppet-Trunk
>>> > > > > > > > > > > * Docker-Puppet-Trunk-pull
>>> > > > > > > > > > > * Docker-Toolchain-Trunk
>>> > > > > > > > > > > * Docker-Toolchain-Trunk-pull
>>> > > > > > > > > > >
>>> > > > > > > > > > > * Bigtop-trunk-packages
>>> > > > > > > > > > > * Bigtop-trunk-repos
>>> > > > > > > > > > >
>>> > > > > > > > > > > * Remove-All-Docker-Containers-Except-Nexus
>>> > > > > > > > > > > * Remove-Dangling-Docker-Images
>>> > > > > > > > > > > * Remove-Inactive-Containers
>>> > > > > > > > > > >
>>> > > > > > > > > > > Kengo Seki <se...@apache.org>
>>> > > > > > > > > > >
>>> > > > > > > > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
>>> > > evansye@apache.org
>>> > > > >
>>> > > > > > > wrote:
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > Awesome! Nice to hear from you, buddy!
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
>>> > 上午3:54寫道:
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > > Hi Evans,
>>> > > > > > > > > > > > > Let me see what I can do. Give me 24 hr :)
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
>>> > > > > evansye@apache.org>
>>> > > > > > > > > > wrote:
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > > Yes. I think the action is correct. However [2]
>>> > might
>>> > > > be
>>> > > > > a
>>> > > > > > > > > > different
>>> > > > > > > > > > > > > thing
>>> > > > > > > > > > > > > > for PPC integration in Hadoop.
>>> > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > Amir,
>>> > > > > > > > > > > > > > Could you confirm?
>>> > > > > > > > > > > > > >
>>> > > > > > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一
>>> > > > 下午9:56寫道:
>>> > > > > > > > > > > > > >
>>> > > > > > > > > > > > > >> Thank you for the advice, Evans!
>>> > > > > > > > > > > > > >> Let me confirm about "PPC machine owners".
>>> > According
>>> > > > to
>>> > > > > > > Amir's
>>> > > > > > > > > > JIRA
>>> > > > > > > > > > > > > >> issues [1][2] and the powered-by list in the
>>> OSU
>>> > > site
>>> > > > > [3],
>>> > > > > > > > we're
>>> > > > > > > > > > > using
>>> > > > > > > > > > > > > >> a VM hosted by OSU OSL, right?
>>> > > > > > > > > > > > > >> If it's correct, I'm going to ask them for
>>> help
>>> > via
>>> > > > > > > > > > > > > >> powerdev-request@osuosl.org.
>>> > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > >> [1]:
>>> > > > > > > > > > > > > >>
>>> > > > > > > > > > > > >
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
>>> > > > > > > > > > > > > >> [2]:
>>> > > > https://issues.apache.org/jira/browse/INFRA-12014
>>> > > > > > > > > > > > > >> [3]:
>>> > > > > > > > > > > > >
>>> > > > > > > > >
>>> > > > >
>>> https://osuosl.org/services/powerdev/current-projects/#foss-projects
>>> > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > >> Kengo Seki <se...@apache.org>
>>> > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
>>> > > > > > > evansye@apache.org>
>>> > > > > > > > > > > wrote:
>>> > > > > > > > > > > > > >> >
>>> > > > > > > > > > > > > >> > I'd suggest to reach out to PPC machine
>>> owners.
>>> > > > Worst
>>> > > > > case
>>> > > > > > > > Is
>>> > > > > > > > > we
>>> > > > > > > > > > > can
>>> > > > > > > > > > > > > >> > temporary  drop the PPC support to move the
>>> > > release
>>> > > > > > > forward.
>>> > > > > > > > > > > > > >> >
>>> > > > > > > > > > > > > >> > Kengo Seki <se...@apache.org> 於
>>> 2020年9月14日 週一
>>> > > > 12:44
>>> > > > > 寫道:
>>> > > > > > > > > > > > > >> >
>>> > > > > > > > > > > > > >> > > Hi everyone,
>>> > > > > > > > > > > > > >> > >
>>> > > > > > > > > > > > > >> > > Let me share information about the CI
>>> > > environment.
>>> > > > > > > > > > > > > >> > > The worker node for ppc64le is currently
>>> > > offlined,
>>> > > > > so I
>>> > > > > > > > just
>>> > > > > > > > > > > killed
>>> > > > > > > > > > > > > >> all
>>> > > > > > > > > > > > > >> > > jobs
>>> > > > > > > > > > > > > >> > > in the queue waiting for it gets back. Its
>>> > > status
>>> > > > > is as
>>> > > > > > > > > > follows.
>>> > > > > > > > > > > > > >> > >
>>> > > > > > > > > > > > > >> > > - According to the result of `who -b`,
>>> that
>>> > > > machine
>>> > > > > > > seems
>>> > > > > > > > to
>>> > > > > > > > > > be
>>> > > > > > > > > > > > > >> rebooted
>>> > > > > > > > > > > > > >> > >   on 2020-09-11 for some reason (probably
>>> > > > > unexpectedly).
>>> > > > > > > > > > > > > >> > >
>>> > > > > > > > > > > > > >> > > - According to the result of dmesg, the
>>> root
>>> > > > volume
>>> > > > > was
>>> > > > > > > > > > mounted
>>> > > > > > > > > > > > > >> > >   in read-only mode because of a fsck
>>> failure.
>>> > > > > > > > > > > > > >> > >
>>> > > > > > > > > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't
>>> > > remount
>>> > > > > RDWR
>>> > > > > > > > > because
>>> > > > > > > > > > > of
>>> > > > > > > > > > > > > >> > > unprocessed orphan inode list.  Please
>>> > > > > umount/remount
>>> > > > > > > > > instead
>>> > > > > > > > > > > > > >> > >   [   60.714110] cgroup: new mount
>>> options do
>>> > > not
>>> > > > > match
>>> > > > > > > > the
>>> > > > > > > > > > > existing
>>> > > > > > > > > > > > > >> > > superblock, will be ignored
>>> > > > > > > > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error
>>> count
>>> > > since
>>> > > > > last
>>> > > > > > > > > fsck:
>>> > > > > > > > > > > 9459
>>> > > > > > > > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial
>>> error
>>> > > at
>>> > > > > time
>>> > > > > > > > > > > 1540294049:
>>> > > > > > > > > > > > > >> > > ext4_validate_inode_bitmap:134
>>> > > > > > > > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last
>>> error at
>>> > > > time
>>> > > > > > > > > > 1596881526:
>>> > > > > > > > > > > > > >> > > ext4_free_inode:383
>>> > > > > > > > > > > > > >> > >
>>> > > > > > > > > > > > > >> > > It looks like some fsck work (and
>>> replacing
>>> > the
>>> > > > > volume,
>>> > > > > > > if
>>> > > > > > > > > it
>>> > > > > > > > > > > fails)
>>> > > > > > > > > > > > > >> > > are required,
>>> > > > > > > > > > > > > >> > > but I'm not sure if I could run something
>>> like
>>> > > > > `e2fsck
>>> > > > > > > > -p`,
>>> > > > > > > > > > > because
>>> > > > > > > > > > > > > >> > > I'm also not sure
>>> > > > > > > > > > > > > >> > > where does that machine exist or who's
>>> > managing
>>> > > > it.
>>> > > > > > > > > > > > > >> > > (I slightly thought it was running as a VM
>>> > with
>>> > > > > QEMU on
>>> > > > > > > > some
>>> > > > > > > > > > EC2
>>> > > > > > > > > > > > > >> > > instance, but I couldn't find it)
>>> > > > > > > > > > > > > >> > >
>>> > > > > > > > > > > > > >> > > > Cos, Evans, Olaf
>>> > > > > > > > > > > > > >> > > Would you provide any suggestions?
>>> > > > > > > > > > > > > >> > >
>>> > > > > > > > > > > > > >> > > Kengo Seki <se...@apache.org>
>>> > > > > > > > > > > > > >> > >
>>> > > > > > > > > > > > > >>
>>> > > > > > > > > > > > > >
>>> > > > > > > > > > > > >
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: PPC CI server failure

Posted by MrAsanjar <af...@gmail.com>.
I have verified the state of ppc64le VM, it is operational. Could we enable
the ppc64le build before OpenStack flag the VM as ideal again.

On Thu, Apr 1, 2021 at 4:08 PM MrAsanjar <af...@gmail.com> wrote:

> Hi lads
> I just got an email that IBM has reinstated the ppc64le VM.
>
>
> On Mon, Mar 29, 2021 at 12:05 PM Evans Ye <ev...@apache.org> wrote:
>
>> Great news and thanks, Amir!
>>
>> Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:
>>
>> > Awesome! Looking forward to its back to CI.
>> > Thanks a lot for helping on this, Asanjar!
>> >
>> > Regards,
>> >
>> > Jun
>> >
>> > MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
>> >
>> > > Hi old friends :)
>> > > We should have a ppc64le VM back online sometime this week. I'll keep
>> you
>> > > all posted.
>> > >
>> > > On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org> wrote:
>> > >
>> > > > Hi rbkrishn,
>> > > >
>> > > > Would you mind to comment whether those PPC servers for Bigtop CI
>> can
>> > be
>> > > > brought up and unlock our release process?
>> > > > Thanks!
>> > > >
>> > > > Best,
>> > > > Evans
>> > > >
>> > > > Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
>> > > >
>> > > > > Thank you for checking, Evans and Amir!
>> > > > >
>> > > > > Kengo Seki <se...@apache.org>
>> > > > >
>> > > > > On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org>
>> wrote:
>> > > > > >
>> > > > > > Thank you, Amir.
>> > > > > >
>> > > > > > MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
>> > > > > >
>> > > > > > > Hi Evans, let me check with IBM again.
>> > > > > > >
>> > > > > > >
>> > > > > > > On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <ev...@apache.org>
>> > > wrote:
>> > > > > > >
>> > > > > > > > Hi Amir,
>> > > > > > > >
>> > > > > > > > We're planning Bigtop 1.5 release and if we don't have the
>> CI
>> > > nodes
>> > > > > for
>> > > > > > > > PPC, we're not able to release 1.5 with PPC supported.
>> > > > > > > > Could you help to confirm again? Thanks!
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Evans Ye
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
>> > > > > > > >
>> > > > > > > > > I have informed IBM management regarding the situation,
>> > waiting
>> > > > > for a
>> > > > > > > > > reply.
>> > > > > > > > >
>> > > > > > > > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <
>> evansye@apache.org
>> > >
>> > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Ok. Thanks for doing this to get the ball rolling.
>> > > > > > > > > >
>> > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29
>> 寫道:
>> > > > > > > > > >
>> > > > > > > > > > > Thank you for your help, Amir!
>> > > > > > > > > > > It's just a heads-up, I temporarily disabled builds
>> for
>> > ppc
>> > > > in
>> > > > > the
>> > > > > > > > > > > following Jenkins jobs so that they can finish.
>> > > > > > > > > > >
>> > > > > > > > > > > * Docker-Puppet-Trunk
>> > > > > > > > > > > * Docker-Puppet-Trunk-pull
>> > > > > > > > > > > * Docker-Toolchain-Trunk
>> > > > > > > > > > > * Docker-Toolchain-Trunk-pull
>> > > > > > > > > > >
>> > > > > > > > > > > * Bigtop-trunk-packages
>> > > > > > > > > > > * Bigtop-trunk-repos
>> > > > > > > > > > >
>> > > > > > > > > > > * Remove-All-Docker-Containers-Except-Nexus
>> > > > > > > > > > > * Remove-Dangling-Docker-Images
>> > > > > > > > > > > * Remove-Inactive-Containers
>> > > > > > > > > > >
>> > > > > > > > > > > Kengo Seki <se...@apache.org>
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
>> > > evansye@apache.org
>> > > > >
>> > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > Awesome! Nice to hear from you, buddy!
>> > > > > > > > > > > >
>> > > > > > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
>> > 上午3:54寫道:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi Evans,
>> > > > > > > > > > > > > Let me see what I can do. Give me 24 hr :)
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
>> > > > > evansye@apache.org>
>> > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Yes. I think the action is correct. However [2]
>> > might
>> > > > be
>> > > > > a
>> > > > > > > > > > different
>> > > > > > > > > > > > > thing
>> > > > > > > > > > > > > > for PPC integration in Hadoop.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Amir,
>> > > > > > > > > > > > > > Could you confirm?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一
>> > > > 下午9:56寫道:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >> Thank you for the advice, Evans!
>> > > > > > > > > > > > > >> Let me confirm about "PPC machine owners".
>> > According
>> > > > to
>> > > > > > > Amir's
>> > > > > > > > > > JIRA
>> > > > > > > > > > > > > >> issues [1][2] and the powered-by list in the
>> OSU
>> > > site
>> > > > > [3],
>> > > > > > > > we're
>> > > > > > > > > > > using
>> > > > > > > > > > > > > >> a VM hosted by OSU OSL, right?
>> > > > > > > > > > > > > >> If it's correct, I'm going to ask them for help
>> > via
>> > > > > > > > > > > > > >> powerdev-request@osuosl.org.
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> [1]:
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
>> > > > > > > > > > > > > >> [2]:
>> > > > https://issues.apache.org/jira/browse/INFRA-12014
>> > > > > > > > > > > > > >> [3]:
>> > > > > > > > > > > > >
>> > > > > > > > >
>> > > > >
>> https://osuosl.org/services/powerdev/current-projects/#foss-projects
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> Kengo Seki <se...@apache.org>
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
>> > > > > > > evansye@apache.org>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >> > I'd suggest to reach out to PPC machine
>> owners.
>> > > > Worst
>> > > > > case
>> > > > > > > > Is
>> > > > > > > > > we
>> > > > > > > > > > > can
>> > > > > > > > > > > > > >> > temporary  drop the PPC support to move the
>> > > release
>> > > > > > > forward.
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日
>> 週一
>> > > > 12:44
>> > > > > 寫道:
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >> > > Hi everyone,
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > > Let me share information about the CI
>> > > environment.
>> > > > > > > > > > > > > >> > > The worker node for ppc64le is currently
>> > > offlined,
>> > > > > so I
>> > > > > > > > just
>> > > > > > > > > > > killed
>> > > > > > > > > > > > > >> all
>> > > > > > > > > > > > > >> > > jobs
>> > > > > > > > > > > > > >> > > in the queue waiting for it gets back. Its
>> > > status
>> > > > > is as
>> > > > > > > > > > follows.
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > > - According to the result of `who -b`, that
>> > > > machine
>> > > > > > > seems
>> > > > > > > > to
>> > > > > > > > > > be
>> > > > > > > > > > > > > >> rebooted
>> > > > > > > > > > > > > >> > >   on 2020-09-11 for some reason (probably
>> > > > > unexpectedly).
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > > - According to the result of dmesg, the
>> root
>> > > > volume
>> > > > > was
>> > > > > > > > > > mounted
>> > > > > > > > > > > > > >> > >   in read-only mode because of a fsck
>> failure.
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't
>> > > remount
>> > > > > RDWR
>> > > > > > > > > because
>> > > > > > > > > > > of
>> > > > > > > > > > > > > >> > > unprocessed orphan inode list.  Please
>> > > > > umount/remount
>> > > > > > > > > instead
>> > > > > > > > > > > > > >> > >   [   60.714110] cgroup: new mount options
>> do
>> > > not
>> > > > > match
>> > > > > > > > the
>> > > > > > > > > > > existing
>> > > > > > > > > > > > > >> > > superblock, will be ignored
>> > > > > > > > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error
>> count
>> > > since
>> > > > > last
>> > > > > > > > > fsck:
>> > > > > > > > > > > 9459
>> > > > > > > > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial
>> error
>> > > at
>> > > > > time
>> > > > > > > > > > > 1540294049:
>> > > > > > > > > > > > > >> > > ext4_validate_inode_bitmap:134
>> > > > > > > > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last
>> error at
>> > > > time
>> > > > > > > > > > 1596881526:
>> > > > > > > > > > > > > >> > > ext4_free_inode:383
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > > It looks like some fsck work (and replacing
>> > the
>> > > > > volume,
>> > > > > > > if
>> > > > > > > > > it
>> > > > > > > > > > > fails)
>> > > > > > > > > > > > > >> > > are required,
>> > > > > > > > > > > > > >> > > but I'm not sure if I could run something
>> like
>> > > > > `e2fsck
>> > > > > > > > -p`,
>> > > > > > > > > > > because
>> > > > > > > > > > > > > >> > > I'm also not sure
>> > > > > > > > > > > > > >> > > where does that machine exist or who's
>> > managing
>> > > > it.
>> > > > > > > > > > > > > >> > > (I slightly thought it was running as a VM
>> > with
>> > > > > QEMU on
>> > > > > > > > some
>> > > > > > > > > > EC2
>> > > > > > > > > > > > > >> > > instance, but I couldn't find it)
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > > > Cos, Evans, Olaf
>> > > > > > > > > > > > > >> > > Would you provide any suggestions?
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > > Kengo Seki <se...@apache.org>
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: PPC CI server failure

Posted by MrAsanjar <af...@gmail.com>.
Hi lads
I just got an email that IBM has reinstated the ppc64le VM.


On Mon, Mar 29, 2021 at 12:05 PM Evans Ye <ev...@apache.org> wrote:

> Great news and thanks, Amir!
>
> Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:
>
> > Awesome! Looking forward to its back to CI.
> > Thanks a lot for helping on this, Asanjar!
> >
> > Regards,
> >
> > Jun
> >
> > MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
> >
> > > Hi old friends :)
> > > We should have a ppc64le VM back online sometime this week. I'll keep
> you
> > > all posted.
> > >
> > > On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org> wrote:
> > >
> > > > Hi rbkrishn,
> > > >
> > > > Would you mind to comment whether those PPC servers for Bigtop CI can
> > be
> > > > brought up and unlock our release process?
> > > > Thanks!
> > > >
> > > > Best,
> > > > Evans
> > > >
> > > > Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
> > > >
> > > > > Thank you for checking, Evans and Amir!
> > > > >
> > > > > Kengo Seki <se...@apache.org>
> > > > >
> > > > > On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org>
> wrote:
> > > > > >
> > > > > > Thank you, Amir.
> > > > > >
> > > > > > MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> > > > > >
> > > > > > > Hi Evans, let me check with IBM again.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <ev...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Amir,
> > > > > > > >
> > > > > > > > We're planning Bigtop 1.5 release and if we don't have the CI
> > > nodes
> > > > > for
> > > > > > > > PPC, we're not able to release 1.5 with PPC supported.
> > > > > > > > Could you help to confirm again? Thanks!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Evans Ye
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> > > > > > > >
> > > > > > > > > I have informed IBM management regarding the situation,
> > waiting
> > > > > for a
> > > > > > > > > reply.
> > > > > > > > >
> > > > > > > > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <
> evansye@apache.org
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Ok. Thanks for doing this to get the ball rolling.
> > > > > > > > > >
> > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
> > > > > > > > > >
> > > > > > > > > > > Thank you for your help, Amir!
> > > > > > > > > > > It's just a heads-up, I temporarily disabled builds for
> > ppc
> > > > in
> > > > > the
> > > > > > > > > > > following Jenkins jobs so that they can finish.
> > > > > > > > > > >
> > > > > > > > > > > * Docker-Puppet-Trunk
> > > > > > > > > > > * Docker-Puppet-Trunk-pull
> > > > > > > > > > > * Docker-Toolchain-Trunk
> > > > > > > > > > > * Docker-Toolchain-Trunk-pull
> > > > > > > > > > >
> > > > > > > > > > > * Bigtop-trunk-packages
> > > > > > > > > > > * Bigtop-trunk-repos
> > > > > > > > > > >
> > > > > > > > > > > * Remove-All-Docker-Containers-Except-Nexus
> > > > > > > > > > > * Remove-Dangling-Docker-Images
> > > > > > > > > > > * Remove-Inactive-Containers
> > > > > > > > > > >
> > > > > > > > > > > Kengo Seki <se...@apache.org>
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
> > > evansye@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Awesome! Nice to hear from you, buddy!
> > > > > > > > > > > >
> > > > > > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
> > 上午3:54寫道:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Evans,
> > > > > > > > > > > > > Let me see what I can do. Give me 24 hr :)
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> > > > > evansye@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes. I think the action is correct. However [2]
> > might
> > > > be
> > > > > a
> > > > > > > > > > different
> > > > > > > > > > > > > thing
> > > > > > > > > > > > > > for PPC integration in Hadoop.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Amir,
> > > > > > > > > > > > > > Could you confirm?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一
> > > > 下午9:56寫道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Thank you for the advice, Evans!
> > > > > > > > > > > > > >> Let me confirm about "PPC machine owners".
> > According
> > > > to
> > > > > > > Amir's
> > > > > > > > > > JIRA
> > > > > > > > > > > > > >> issues [1][2] and the powered-by list in the OSU
> > > site
> > > > > [3],
> > > > > > > > we're
> > > > > > > > > > > using
> > > > > > > > > > > > > >> a VM hosted by OSU OSL, right?
> > > > > > > > > > > > > >> If it's correct, I'm going to ask them for help
> > via
> > > > > > > > > > > > > >> powerdev-request@osuosl.org.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> [1]:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > > > > > > > > > > >> [2]:
> > > > https://issues.apache.org/jira/browse/INFRA-12014
> > > > > > > > > > > > > >> [3]:
> > > > > > > > > > > > >
> > > > > > > > >
> > > > >
> https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Kengo Seki <se...@apache.org>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> > > > > > > evansye@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > I'd suggest to reach out to PPC machine
> owners.
> > > > Worst
> > > > > case
> > > > > > > > Is
> > > > > > > > > we
> > > > > > > > > > > can
> > > > > > > > > > > > > >> > temporary  drop the PPC support to move the
> > > release
> > > > > > > forward.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日
> 週一
> > > > 12:44
> > > > > 寫道:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > > Hi everyone,
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > Let me share information about the CI
> > > environment.
> > > > > > > > > > > > > >> > > The worker node for ppc64le is currently
> > > offlined,
> > > > > so I
> > > > > > > > just
> > > > > > > > > > > killed
> > > > > > > > > > > > > >> all
> > > > > > > > > > > > > >> > > jobs
> > > > > > > > > > > > > >> > > in the queue waiting for it gets back. Its
> > > status
> > > > > is as
> > > > > > > > > > follows.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > - According to the result of `who -b`, that
> > > > machine
> > > > > > > seems
> > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > > > >> rebooted
> > > > > > > > > > > > > >> > >   on 2020-09-11 for some reason (probably
> > > > > unexpectedly).
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > - According to the result of dmesg, the root
> > > > volume
> > > > > was
> > > > > > > > > > mounted
> > > > > > > > > > > > > >> > >   in read-only mode because of a fsck
> failure.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't
> > > remount
> > > > > RDWR
> > > > > > > > > because
> > > > > > > > > > > of
> > > > > > > > > > > > > >> > > unprocessed orphan inode list.  Please
> > > > > umount/remount
> > > > > > > > > instead
> > > > > > > > > > > > > >> > >   [   60.714110] cgroup: new mount options
> do
> > > not
> > > > > match
> > > > > > > > the
> > > > > > > > > > > existing
> > > > > > > > > > > > > >> > > superblock, will be ignored
> > > > > > > > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error count
> > > since
> > > > > last
> > > > > > > > > fsck:
> > > > > > > > > > > 9459
> > > > > > > > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial
> error
> > > at
> > > > > time
> > > > > > > > > > > 1540294049:
> > > > > > > > > > > > > >> > > ext4_validate_inode_bitmap:134
> > > > > > > > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last error
> at
> > > > time
> > > > > > > > > > 1596881526:
> > > > > > > > > > > > > >> > > ext4_free_inode:383
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > It looks like some fsck work (and replacing
> > the
> > > > > volume,
> > > > > > > if
> > > > > > > > > it
> > > > > > > > > > > fails)
> > > > > > > > > > > > > >> > > are required,
> > > > > > > > > > > > > >> > > but I'm not sure if I could run something
> like
> > > > > `e2fsck
> > > > > > > > -p`,
> > > > > > > > > > > because
> > > > > > > > > > > > > >> > > I'm also not sure
> > > > > > > > > > > > > >> > > where does that machine exist or who's
> > managing
> > > > it.
> > > > > > > > > > > > > >> > > (I slightly thought it was running as a VM
> > with
> > > > > QEMU on
> > > > > > > > some
> > > > > > > > > > EC2
> > > > > > > > > > > > > >> > > instance, but I couldn't find it)
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > > Cos, Evans, Olaf
> > > > > > > > > > > > > >> > > Would you provide any suggestions?
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > Kengo Seki <se...@apache.org>
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: PPC CI server failure

Posted by Evans Ye <ev...@apache.org>.
Great news and thanks, Amir!

Jun HE <ju...@apache.org> 於 2021年3月29日 週一 下午1:54寫道:

> Awesome! Looking forward to its back to CI.
> Thanks a lot for helping on this, Asanjar!
>
> Regards,
>
> Jun
>
> MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:
>
> > Hi old friends :)
> > We should have a ppc64le VM back online sometime this week. I'll keep you
> > all posted.
> >
> > On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org> wrote:
> >
> > > Hi rbkrishn,
> > >
> > > Would you mind to comment whether those PPC servers for Bigtop CI can
> be
> > > brought up and unlock our release process?
> > > Thanks!
> > >
> > > Best,
> > > Evans
> > >
> > > Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
> > >
> > > > Thank you for checking, Evans and Amir!
> > > >
> > > > Kengo Seki <se...@apache.org>
> > > >
> > > > On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org> wrote:
> > > > >
> > > > > Thank you, Amir.
> > > > >
> > > > > MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> > > > >
> > > > > > Hi Evans, let me check with IBM again.
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <ev...@apache.org>
> > wrote:
> > > > > >
> > > > > > > Hi Amir,
> > > > > > >
> > > > > > > We're planning Bigtop 1.5 release and if we don't have the CI
> > nodes
> > > > for
> > > > > > > PPC, we're not able to release 1.5 with PPC supported.
> > > > > > > Could you help to confirm again? Thanks!
> > > > > > >
> > > > > > > Best,
> > > > > > > Evans Ye
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> > > > > > >
> > > > > > > > I have informed IBM management regarding the situation,
> waiting
> > > > for a
> > > > > > > > reply.
> > > > > > > >
> > > > > > > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <evansye@apache.org
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Ok. Thanks for doing this to get the ball rolling.
> > > > > > > > >
> > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
> > > > > > > > >
> > > > > > > > > > Thank you for your help, Amir!
> > > > > > > > > > It's just a heads-up, I temporarily disabled builds for
> ppc
> > > in
> > > > the
> > > > > > > > > > following Jenkins jobs so that they can finish.
> > > > > > > > > >
> > > > > > > > > > * Docker-Puppet-Trunk
> > > > > > > > > > * Docker-Puppet-Trunk-pull
> > > > > > > > > > * Docker-Toolchain-Trunk
> > > > > > > > > > * Docker-Toolchain-Trunk-pull
> > > > > > > > > >
> > > > > > > > > > * Bigtop-trunk-packages
> > > > > > > > > > * Bigtop-trunk-repos
> > > > > > > > > >
> > > > > > > > > > * Remove-All-Docker-Containers-Except-Nexus
> > > > > > > > > > * Remove-Dangling-Docker-Images
> > > > > > > > > > * Remove-Inactive-Containers
> > > > > > > > > >
> > > > > > > > > > Kengo Seki <se...@apache.org>
> > > > > > > > > >
> > > > > > > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
> > evansye@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Awesome! Nice to hear from you, buddy!
> > > > > > > > > > >
> > > > > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三
> 上午3:54寫道:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Evans,
> > > > > > > > > > > > Let me see what I can do. Give me 24 hr :)
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> > > > evansye@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Yes. I think the action is correct. However [2]
> might
> > > be
> > > > a
> > > > > > > > > different
> > > > > > > > > > > > thing
> > > > > > > > > > > > > for PPC integration in Hadoop.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Amir,
> > > > > > > > > > > > > Could you confirm?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一
> > > 下午9:56寫道:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Thank you for the advice, Evans!
> > > > > > > > > > > > >> Let me confirm about "PPC machine owners".
> According
> > > to
> > > > > > Amir's
> > > > > > > > > JIRA
> > > > > > > > > > > > >> issues [1][2] and the powered-by list in the OSU
> > site
> > > > [3],
> > > > > > > we're
> > > > > > > > > > using
> > > > > > > > > > > > >> a VM hosted by OSU OSL, right?
> > > > > > > > > > > > >> If it's correct, I'm going to ask them for help
> via
> > > > > > > > > > > > >> powerdev-request@osuosl.org.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> [1]:
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > > > > > > > > > >> [2]:
> > > https://issues.apache.org/jira/browse/INFRA-12014
> > > > > > > > > > > > >> [3]:
> > > > > > > > > > > >
> > > > > > > >
> > > > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Kengo Seki <se...@apache.org>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> > > > > > evansye@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > I'd suggest to reach out to PPC machine owners.
> > > Worst
> > > > case
> > > > > > > Is
> > > > > > > > we
> > > > > > > > > > can
> > > > > > > > > > > > >> > temporary  drop the PPC support to move the
> > release
> > > > > > forward.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一
> > > 12:44
> > > > 寫道:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > Hi everyone,
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Let me share information about the CI
> > environment.
> > > > > > > > > > > > >> > > The worker node for ppc64le is currently
> > offlined,
> > > > so I
> > > > > > > just
> > > > > > > > > > killed
> > > > > > > > > > > > >> all
> > > > > > > > > > > > >> > > jobs
> > > > > > > > > > > > >> > > in the queue waiting for it gets back. Its
> > status
> > > > is as
> > > > > > > > > follows.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > - According to the result of `who -b`, that
> > > machine
> > > > > > seems
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > > >> rebooted
> > > > > > > > > > > > >> > >   on 2020-09-11 for some reason (probably
> > > > unexpectedly).
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > - According to the result of dmesg, the root
> > > volume
> > > > was
> > > > > > > > > mounted
> > > > > > > > > > > > >> > >   in read-only mode because of a fsck failure.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't
> > remount
> > > > RDWR
> > > > > > > > because
> > > > > > > > > > of
> > > > > > > > > > > > >> > > unprocessed orphan inode list.  Please
> > > > umount/remount
> > > > > > > > instead
> > > > > > > > > > > > >> > >   [   60.714110] cgroup: new mount options do
> > not
> > > > match
> > > > > > > the
> > > > > > > > > > existing
> > > > > > > > > > > > >> > > superblock, will be ignored
> > > > > > > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error count
> > since
> > > > last
> > > > > > > > fsck:
> > > > > > > > > > 9459
> > > > > > > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial error
> > at
> > > > time
> > > > > > > > > > 1540294049:
> > > > > > > > > > > > >> > > ext4_validate_inode_bitmap:134
> > > > > > > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last error at
> > > time
> > > > > > > > > 1596881526:
> > > > > > > > > > > > >> > > ext4_free_inode:383
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > It looks like some fsck work (and replacing
> the
> > > > volume,
> > > > > > if
> > > > > > > > it
> > > > > > > > > > fails)
> > > > > > > > > > > > >> > > are required,
> > > > > > > > > > > > >> > > but I'm not sure if I could run something like
> > > > `e2fsck
> > > > > > > -p`,
> > > > > > > > > > because
> > > > > > > > > > > > >> > > I'm also not sure
> > > > > > > > > > > > >> > > where does that machine exist or who's
> managing
> > > it.
> > > > > > > > > > > > >> > > (I slightly thought it was running as a VM
> with
> > > > QEMU on
> > > > > > > some
> > > > > > > > > EC2
> > > > > > > > > > > > >> > > instance, but I couldn't find it)
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > > Cos, Evans, Olaf
> > > > > > > > > > > > >> > > Would you provide any suggestions?
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Kengo Seki <se...@apache.org>
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: PPC CI server failure

Posted by Jun HE <ju...@apache.org>.
Awesome! Looking forward to its back to CI.
Thanks a lot for helping on this, Asanjar!

Regards,

Jun

MrAsanjar <af...@gmail.com> 于2021年3月29日周一 上午10:18写道:

> Hi old friends :)
> We should have a ppc64le VM back online sometime this week. I'll keep you
> all posted.
>
> On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org> wrote:
>
> > Hi rbkrishn,
> >
> > Would you mind to comment whether those PPC servers for Bigtop CI can be
> > brought up and unlock our release process?
> > Thanks!
> >
> > Best,
> > Evans
> >
> > Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
> >
> > > Thank you for checking, Evans and Amir!
> > >
> > > Kengo Seki <se...@apache.org>
> > >
> > > On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org> wrote:
> > > >
> > > > Thank you, Amir.
> > > >
> > > > MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> > > >
> > > > > Hi Evans, let me check with IBM again.
> > > > >
> > > > >
> > > > > On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <ev...@apache.org>
> wrote:
> > > > >
> > > > > > Hi Amir,
> > > > > >
> > > > > > We're planning Bigtop 1.5 release and if we don't have the CI
> nodes
> > > for
> > > > > > PPC, we're not able to release 1.5 with PPC supported.
> > > > > > Could you help to confirm again? Thanks!
> > > > > >
> > > > > > Best,
> > > > > > Evans Ye
> > > > > >
> > > > > >
> > > > > >
> > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> > > > > >
> > > > > > > I have informed IBM management regarding the situation, waiting
> > > for a
> > > > > > > reply.
> > > > > > >
> > > > > > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <ev...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > Ok. Thanks for doing this to get the ball rolling.
> > > > > > > >
> > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
> > > > > > > >
> > > > > > > > > Thank you for your help, Amir!
> > > > > > > > > It's just a heads-up, I temporarily disabled builds for ppc
> > in
> > > the
> > > > > > > > > following Jenkins jobs so that they can finish.
> > > > > > > > >
> > > > > > > > > * Docker-Puppet-Trunk
> > > > > > > > > * Docker-Puppet-Trunk-pull
> > > > > > > > > * Docker-Toolchain-Trunk
> > > > > > > > > * Docker-Toolchain-Trunk-pull
> > > > > > > > >
> > > > > > > > > * Bigtop-trunk-packages
> > > > > > > > > * Bigtop-trunk-repos
> > > > > > > > >
> > > > > > > > > * Remove-All-Docker-Containers-Except-Nexus
> > > > > > > > > * Remove-Dangling-Docker-Images
> > > > > > > > > * Remove-Inactive-Containers
> > > > > > > > >
> > > > > > > > > Kengo Seki <se...@apache.org>
> > > > > > > > >
> > > > > > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <
> evansye@apache.org
> > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Awesome! Nice to hear from you, buddy!
> > > > > > > > > >
> > > > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
> > > > > > > > > >
> > > > > > > > > > > Hi Evans,
> > > > > > > > > > > Let me see what I can do. Give me 24 hr :)
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> > > evansye@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Yes. I think the action is correct. However [2] might
> > be
> > > a
> > > > > > > > different
> > > > > > > > > > > thing
> > > > > > > > > > > > for PPC integration in Hadoop.
> > > > > > > > > > > >
> > > > > > > > > > > > Amir,
> > > > > > > > > > > > Could you confirm?
> > > > > > > > > > > >
> > > > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一
> > 下午9:56寫道:
> > > > > > > > > > > >
> > > > > > > > > > > >> Thank you for the advice, Evans!
> > > > > > > > > > > >> Let me confirm about "PPC machine owners". According
> > to
> > > > > Amir's
> > > > > > > > JIRA
> > > > > > > > > > > >> issues [1][2] and the powered-by list in the OSU
> site
> > > [3],
> > > > > > we're
> > > > > > > > > using
> > > > > > > > > > > >> a VM hosted by OSU OSL, right?
> > > > > > > > > > > >> If it's correct, I'm going to ask them for help via
> > > > > > > > > > > >> powerdev-request@osuosl.org.
> > > > > > > > > > > >>
> > > > > > > > > > > >> [1]:
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > > > > > > > > >> [2]:
> > https://issues.apache.org/jira/browse/INFRA-12014
> > > > > > > > > > > >> [3]:
> > > > > > > > > > >
> > > > > > >
> > > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > > > > > > > > >>
> > > > > > > > > > > >> Kengo Seki <se...@apache.org>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> > > > > evansye@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > I'd suggest to reach out to PPC machine owners.
> > Worst
> > > case
> > > > > > Is
> > > > > > > we
> > > > > > > > > can
> > > > > > > > > > > >> > temporary  drop the PPC support to move the
> release
> > > > > forward.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一
> > 12:44
> > > 寫道:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > > Hi everyone,
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > Let me share information about the CI
> environment.
> > > > > > > > > > > >> > > The worker node for ppc64le is currently
> offlined,
> > > so I
> > > > > > just
> > > > > > > > > killed
> > > > > > > > > > > >> all
> > > > > > > > > > > >> > > jobs
> > > > > > > > > > > >> > > in the queue waiting for it gets back. Its
> status
> > > is as
> > > > > > > > follows.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > - According to the result of `who -b`, that
> > machine
> > > > > seems
> > > > > > to
> > > > > > > > be
> > > > > > > > > > > >> rebooted
> > > > > > > > > > > >> > >   on 2020-09-11 for some reason (probably
> > > unexpectedly).
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > - According to the result of dmesg, the root
> > volume
> > > was
> > > > > > > > mounted
> > > > > > > > > > > >> > >   in read-only mode because of a fsck failure.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't
> remount
> > > RDWR
> > > > > > > because
> > > > > > > > > of
> > > > > > > > > > > >> > > unprocessed orphan inode list.  Please
> > > umount/remount
> > > > > > > instead
> > > > > > > > > > > >> > >   [   60.714110] cgroup: new mount options do
> not
> > > match
> > > > > > the
> > > > > > > > > existing
> > > > > > > > > > > >> > > superblock, will be ignored
> > > > > > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error count
> since
> > > last
> > > > > > > fsck:
> > > > > > > > > 9459
> > > > > > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial error
> at
> > > time
> > > > > > > > > 1540294049:
> > > > > > > > > > > >> > > ext4_validate_inode_bitmap:134
> > > > > > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last error at
> > time
> > > > > > > > 1596881526:
> > > > > > > > > > > >> > > ext4_free_inode:383
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > It looks like some fsck work (and replacing the
> > > volume,
> > > > > if
> > > > > > > it
> > > > > > > > > fails)
> > > > > > > > > > > >> > > are required,
> > > > > > > > > > > >> > > but I'm not sure if I could run something like
> > > `e2fsck
> > > > > > -p`,
> > > > > > > > > because
> > > > > > > > > > > >> > > I'm also not sure
> > > > > > > > > > > >> > > where does that machine exist or who's managing
> > it.
> > > > > > > > > > > >> > > (I slightly thought it was running as a VM with
> > > QEMU on
> > > > > > some
> > > > > > > > EC2
> > > > > > > > > > > >> > > instance, but I couldn't find it)
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > > Cos, Evans, Olaf
> > > > > > > > > > > >> > > Would you provide any suggestions?
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > Kengo Seki <se...@apache.org>
> > > > > > > > > > > >> > >
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: PPC CI server failure

Posted by MrAsanjar <af...@gmail.com>.
Hi old friends :)
We should have a ppc64le VM back online sometime this week. I'll keep you
all posted.

On Thu, Nov 19, 2020 at 9:05 PM Evans Ye <ev...@apache.org> wrote:

> Hi rbkrishn,
>
> Would you mind to comment whether those PPC servers for Bigtop CI can be
> brought up and unlock our release process?
> Thanks!
>
> Best,
> Evans
>
> Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:
>
> > Thank you for checking, Evans and Amir!
> >
> > Kengo Seki <se...@apache.org>
> >
> > On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org> wrote:
> > >
> > > Thank you, Amir.
> > >
> > > MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> > >
> > > > Hi Evans, let me check with IBM again.
> > > >
> > > >
> > > > On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <ev...@apache.org> wrote:
> > > >
> > > > > Hi Amir,
> > > > >
> > > > > We're planning Bigtop 1.5 release and if we don't have the CI nodes
> > for
> > > > > PPC, we're not able to release 1.5 with PPC supported.
> > > > > Could you help to confirm again? Thanks!
> > > > >
> > > > > Best,
> > > > > Evans Ye
> > > > >
> > > > >
> > > > >
> > > > > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> > > > >
> > > > > > I have informed IBM management regarding the situation, waiting
> > for a
> > > > > > reply.
> > > > > >
> > > > > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <ev...@apache.org>
> > wrote:
> > > > > >
> > > > > > > Ok. Thanks for doing this to get the ball rolling.
> > > > > > >
> > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
> > > > > > >
> > > > > > > > Thank you for your help, Amir!
> > > > > > > > It's just a heads-up, I temporarily disabled builds for ppc
> in
> > the
> > > > > > > > following Jenkins jobs so that they can finish.
> > > > > > > >
> > > > > > > > * Docker-Puppet-Trunk
> > > > > > > > * Docker-Puppet-Trunk-pull
> > > > > > > > * Docker-Toolchain-Trunk
> > > > > > > > * Docker-Toolchain-Trunk-pull
> > > > > > > >
> > > > > > > > * Bigtop-trunk-packages
> > > > > > > > * Bigtop-trunk-repos
> > > > > > > >
> > > > > > > > * Remove-All-Docker-Containers-Except-Nexus
> > > > > > > > * Remove-Dangling-Docker-Images
> > > > > > > > * Remove-Inactive-Containers
> > > > > > > >
> > > > > > > > Kengo Seki <se...@apache.org>
> > > > > > > >
> > > > > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <evansye@apache.org
> >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > Awesome! Nice to hear from you, buddy!
> > > > > > > > >
> > > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
> > > > > > > > >
> > > > > > > > > > Hi Evans,
> > > > > > > > > > Let me see what I can do. Give me 24 hr :)
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> > evansye@apache.org>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Yes. I think the action is correct. However [2] might
> be
> > a
> > > > > > > different
> > > > > > > > > > thing
> > > > > > > > > > > for PPC integration in Hadoop.
> > > > > > > > > > >
> > > > > > > > > > > Amir,
> > > > > > > > > > > Could you confirm?
> > > > > > > > > > >
> > > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一
> 下午9:56寫道:
> > > > > > > > > > >
> > > > > > > > > > >> Thank you for the advice, Evans!
> > > > > > > > > > >> Let me confirm about "PPC machine owners". According
> to
> > > > Amir's
> > > > > > > JIRA
> > > > > > > > > > >> issues [1][2] and the powered-by list in the OSU site
> > [3],
> > > > > we're
> > > > > > > > using
> > > > > > > > > > >> a VM hosted by OSU OSL, right?
> > > > > > > > > > >> If it's correct, I'm going to ask them for help via
> > > > > > > > > > >> powerdev-request@osuosl.org.
> > > > > > > > > > >>
> > > > > > > > > > >> [1]:
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > > > > > > > >> [2]:
> https://issues.apache.org/jira/browse/INFRA-12014
> > > > > > > > > > >> [3]:
> > > > > > > > > >
> > > > > >
> > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > > > > > > > >>
> > > > > > > > > > >> Kengo Seki <se...@apache.org>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> > > > evansye@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> > I'd suggest to reach out to PPC machine owners.
> Worst
> > case
> > > > > Is
> > > > > > we
> > > > > > > > can
> > > > > > > > > > >> > temporary  drop the PPC support to move the release
> > > > forward.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一
> 12:44
> > 寫道:
> > > > > > > > > > >> >
> > > > > > > > > > >> > > Hi everyone,
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Let me share information about the CI environment.
> > > > > > > > > > >> > > The worker node for ppc64le is currently offlined,
> > so I
> > > > > just
> > > > > > > > killed
> > > > > > > > > > >> all
> > > > > > > > > > >> > > jobs
> > > > > > > > > > >> > > in the queue waiting for it gets back. Its status
> > is as
> > > > > > > follows.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > - According to the result of `who -b`, that
> machine
> > > > seems
> > > > > to
> > > > > > > be
> > > > > > > > > > >> rebooted
> > > > > > > > > > >> > >   on 2020-09-11 for some reason (probably
> > unexpectedly).
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > - According to the result of dmesg, the root
> volume
> > was
> > > > > > > mounted
> > > > > > > > > > >> > >   in read-only mode because of a fsck failure.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount
> > RDWR
> > > > > > because
> > > > > > > > of
> > > > > > > > > > >> > > unprocessed orphan inode list.  Please
> > umount/remount
> > > > > > instead
> > > > > > > > > > >> > >   [   60.714110] cgroup: new mount options do not
> > match
> > > > > the
> > > > > > > > existing
> > > > > > > > > > >> > > superblock, will be ignored
> > > > > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error count since
> > last
> > > > > > fsck:
> > > > > > > > 9459
> > > > > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial error at
> > time
> > > > > > > > 1540294049:
> > > > > > > > > > >> > > ext4_validate_inode_bitmap:134
> > > > > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last error at
> time
> > > > > > > 1596881526:
> > > > > > > > > > >> > > ext4_free_inode:383
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > It looks like some fsck work (and replacing the
> > volume,
> > > > if
> > > > > > it
> > > > > > > > fails)
> > > > > > > > > > >> > > are required,
> > > > > > > > > > >> > > but I'm not sure if I could run something like
> > `e2fsck
> > > > > -p`,
> > > > > > > > because
> > > > > > > > > > >> > > I'm also not sure
> > > > > > > > > > >> > > where does that machine exist or who's managing
> it.
> > > > > > > > > > >> > > (I slightly thought it was running as a VM with
> > QEMU on
> > > > > some
> > > > > > > EC2
> > > > > > > > > > >> > > instance, but I couldn't find it)
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > Cos, Evans, Olaf
> > > > > > > > > > >> > > Would you provide any suggestions?
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Kengo Seki <se...@apache.org>
> > > > > > > > > > >> > >
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: PPC CI server failure

Posted by Evans Ye <ev...@apache.org>.
Hi rbkrishn,

Would you mind to comment whether those PPC servers for Bigtop CI can be
brought up and unlock our release process?
Thanks!

Best,
Evans

Kengo Seki <se...@apache.org> 於 2020年11月18日 週三 上午7:26寫道:

> Thank you for checking, Evans and Amir!
>
> Kengo Seki <se...@apache.org>
>
> On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org> wrote:
> >
> > Thank you, Amir.
> >
> > MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
> >
> > > Hi Evans, let me check with IBM again.
> > >
> > >
> > > On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <ev...@apache.org> wrote:
> > >
> > > > Hi Amir,
> > > >
> > > > We're planning Bigtop 1.5 release and if we don't have the CI nodes
> for
> > > > PPC, we're not able to release 1.5 with PPC supported.
> > > > Could you help to confirm again? Thanks!
> > > >
> > > > Best,
> > > > Evans Ye
> > > >
> > > >
> > > >
> > > > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> > > >
> > > > > I have informed IBM management regarding the situation, waiting
> for a
> > > > > reply.
> > > > >
> > > > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <ev...@apache.org>
> wrote:
> > > > >
> > > > > > Ok. Thanks for doing this to get the ball rolling.
> > > > > >
> > > > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
> > > > > >
> > > > > > > Thank you for your help, Amir!
> > > > > > > It's just a heads-up, I temporarily disabled builds for ppc in
> the
> > > > > > > following Jenkins jobs so that they can finish.
> > > > > > >
> > > > > > > * Docker-Puppet-Trunk
> > > > > > > * Docker-Puppet-Trunk-pull
> > > > > > > * Docker-Toolchain-Trunk
> > > > > > > * Docker-Toolchain-Trunk-pull
> > > > > > >
> > > > > > > * Bigtop-trunk-packages
> > > > > > > * Bigtop-trunk-repos
> > > > > > >
> > > > > > > * Remove-All-Docker-Containers-Except-Nexus
> > > > > > > * Remove-Dangling-Docker-Images
> > > > > > > * Remove-Inactive-Containers
> > > > > > >
> > > > > > > Kengo Seki <se...@apache.org>
> > > > > > >
> > > > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <ev...@apache.org>
> > > wrote:
> > > > > > > >
> > > > > > > > Awesome! Nice to hear from you, buddy!
> > > > > > > >
> > > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
> > > > > > > >
> > > > > > > > > Hi Evans,
> > > > > > > > > Let me see what I can do. Give me 24 hr :)
> > > > > > > > >
> > > > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <
> evansye@apache.org>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Yes. I think the action is correct. However [2] might be
> a
> > > > > > different
> > > > > > > > > thing
> > > > > > > > > > for PPC integration in Hadoop.
> > > > > > > > > >
> > > > > > > > > > Amir,
> > > > > > > > > > Could you confirm?
> > > > > > > > > >
> > > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
> > > > > > > > > >
> > > > > > > > > >> Thank you for the advice, Evans!
> > > > > > > > > >> Let me confirm about "PPC machine owners". According to
> > > Amir's
> > > > > > JIRA
> > > > > > > > > >> issues [1][2] and the powered-by list in the OSU site
> [3],
> > > > we're
> > > > > > > using
> > > > > > > > > >> a VM hosted by OSU OSL, right?
> > > > > > > > > >> If it's correct, I'm going to ask them for help via
> > > > > > > > > >> powerdev-request@osuosl.org.
> > > > > > > > > >>
> > > > > > > > > >> [1]:
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > > > > > > >> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> > > > > > > > > >> [3]:
> > > > > > > > >
> > > > >
> https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > > > > > > >>
> > > > > > > > > >> Kengo Seki <se...@apache.org>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> > > evansye@apache.org>
> > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > I'd suggest to reach out to PPC machine owners. Worst
> case
> > > > Is
> > > > > we
> > > > > > > can
> > > > > > > > > >> > temporary  drop the PPC support to move the release
> > > forward.
> > > > > > > > > >> >
> > > > > > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44
> 寫道:
> > > > > > > > > >> >
> > > > > > > > > >> > > Hi everyone,
> > > > > > > > > >> > >
> > > > > > > > > >> > > Let me share information about the CI environment.
> > > > > > > > > >> > > The worker node for ppc64le is currently offlined,
> so I
> > > > just
> > > > > > > killed
> > > > > > > > > >> all
> > > > > > > > > >> > > jobs
> > > > > > > > > >> > > in the queue waiting for it gets back. Its status
> is as
> > > > > > follows.
> > > > > > > > > >> > >
> > > > > > > > > >> > > - According to the result of `who -b`, that machine
> > > seems
> > > > to
> > > > > > be
> > > > > > > > > >> rebooted
> > > > > > > > > >> > >   on 2020-09-11 for some reason (probably
> unexpectedly).
> > > > > > > > > >> > >
> > > > > > > > > >> > > - According to the result of dmesg, the root volume
> was
> > > > > > mounted
> > > > > > > > > >> > >   in read-only mode because of a fsck failure.
> > > > > > > > > >> > >
> > > > > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount
> RDWR
> > > > > because
> > > > > > > of
> > > > > > > > > >> > > unprocessed orphan inode list.  Please
> umount/remount
> > > > > instead
> > > > > > > > > >> > >   [   60.714110] cgroup: new mount options do not
> match
> > > > the
> > > > > > > existing
> > > > > > > > > >> > > superblock, will be ignored
> > > > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error count since
> last
> > > > > fsck:
> > > > > > > 9459
> > > > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial error at
> time
> > > > > > > 1540294049:
> > > > > > > > > >> > > ext4_validate_inode_bitmap:134
> > > > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last error at time
> > > > > > 1596881526:
> > > > > > > > > >> > > ext4_free_inode:383
> > > > > > > > > >> > >
> > > > > > > > > >> > > It looks like some fsck work (and replacing the
> volume,
> > > if
> > > > > it
> > > > > > > fails)
> > > > > > > > > >> > > are required,
> > > > > > > > > >> > > but I'm not sure if I could run something like
> `e2fsck
> > > > -p`,
> > > > > > > because
> > > > > > > > > >> > > I'm also not sure
> > > > > > > > > >> > > where does that machine exist or who's managing it.
> > > > > > > > > >> > > (I slightly thought it was running as a VM with
> QEMU on
> > > > some
> > > > > > EC2
> > > > > > > > > >> > > instance, but I couldn't find it)
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Cos, Evans, Olaf
> > > > > > > > > >> > > Would you provide any suggestions?
> > > > > > > > > >> > >
> > > > > > > > > >> > > Kengo Seki <se...@apache.org>
> > > > > > > > > >> > >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: PPC CI server failure

Posted by Kengo Seki <se...@apache.org>.
Thank you for checking, Evans and Amir!

Kengo Seki <se...@apache.org>

On Wed, Nov 18, 2020 at 2:09 AM Evans Ye <ev...@apache.org> wrote:
>
> Thank you, Amir.
>
> MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:
>
> > Hi Evans, let me check with IBM again.
> >
> >
> > On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <ev...@apache.org> wrote:
> >
> > > Hi Amir,
> > >
> > > We're planning Bigtop 1.5 release and if we don't have the CI nodes for
> > > PPC, we're not able to release 1.5 with PPC supported.
> > > Could you help to confirm again? Thanks!
> > >
> > > Best,
> > > Evans Ye
> > >
> > >
> > >
> > > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> > >
> > > > I have informed IBM management regarding the situation, waiting for a
> > > > reply.
> > > >
> > > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <ev...@apache.org> wrote:
> > > >
> > > > > Ok. Thanks for doing this to get the ball rolling.
> > > > >
> > > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
> > > > >
> > > > > > Thank you for your help, Amir!
> > > > > > It's just a heads-up, I temporarily disabled builds for ppc in the
> > > > > > following Jenkins jobs so that they can finish.
> > > > > >
> > > > > > * Docker-Puppet-Trunk
> > > > > > * Docker-Puppet-Trunk-pull
> > > > > > * Docker-Toolchain-Trunk
> > > > > > * Docker-Toolchain-Trunk-pull
> > > > > >
> > > > > > * Bigtop-trunk-packages
> > > > > > * Bigtop-trunk-repos
> > > > > >
> > > > > > * Remove-All-Docker-Containers-Except-Nexus
> > > > > > * Remove-Dangling-Docker-Images
> > > > > > * Remove-Inactive-Containers
> > > > > >
> > > > > > Kengo Seki <se...@apache.org>
> > > > > >
> > > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <ev...@apache.org>
> > wrote:
> > > > > > >
> > > > > > > Awesome! Nice to hear from you, buddy!
> > > > > > >
> > > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
> > > > > > >
> > > > > > > > Hi Evans,
> > > > > > > > Let me see what I can do. Give me 24 hr :)
> > > > > > > >
> > > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <ev...@apache.org>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Yes. I think the action is correct. However [2] might be a
> > > > > different
> > > > > > > > thing
> > > > > > > > > for PPC integration in Hadoop.
> > > > > > > > >
> > > > > > > > > Amir,
> > > > > > > > > Could you confirm?
> > > > > > > > >
> > > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
> > > > > > > > >
> > > > > > > > >> Thank you for the advice, Evans!
> > > > > > > > >> Let me confirm about "PPC machine owners". According to
> > Amir's
> > > > > JIRA
> > > > > > > > >> issues [1][2] and the powered-by list in the OSU site [3],
> > > we're
> > > > > > using
> > > > > > > > >> a VM hosted by OSU OSL, right?
> > > > > > > > >> If it's correct, I'm going to ask them for help via
> > > > > > > > >> powerdev-request@osuosl.org.
> > > > > > > > >>
> > > > > > > > >> [1]:
> > > > > > > > >>
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > > > > > >> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> > > > > > > > >> [3]:
> > > > > > > >
> > > > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > > > > > >>
> > > > > > > > >> Kengo Seki <se...@apache.org>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> > evansye@apache.org>
> > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > I'd suggest to reach out to PPC machine owners. Worst case
> > > Is
> > > > we
> > > > > > can
> > > > > > > > >> > temporary  drop the PPC support to move the release
> > forward.
> > > > > > > > >> >
> > > > > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
> > > > > > > > >> >
> > > > > > > > >> > > Hi everyone,
> > > > > > > > >> > >
> > > > > > > > >> > > Let me share information about the CI environment.
> > > > > > > > >> > > The worker node for ppc64le is currently offlined, so I
> > > just
> > > > > > killed
> > > > > > > > >> all
> > > > > > > > >> > > jobs
> > > > > > > > >> > > in the queue waiting for it gets back. Its status is as
> > > > > follows.
> > > > > > > > >> > >
> > > > > > > > >> > > - According to the result of `who -b`, that machine
> > seems
> > > to
> > > > > be
> > > > > > > > >> rebooted
> > > > > > > > >> > >   on 2020-09-11 for some reason (probably unexpectedly).
> > > > > > > > >> > >
> > > > > > > > >> > > - According to the result of dmesg, the root volume was
> > > > > mounted
> > > > > > > > >> > >   in read-only mode because of a fsck failure.
> > > > > > > > >> > >
> > > > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR
> > > > because
> > > > > > of
> > > > > > > > >> > > unprocessed orphan inode list.  Please umount/remount
> > > > instead
> > > > > > > > >> > >   [   60.714110] cgroup: new mount options do not match
> > > the
> > > > > > existing
> > > > > > > > >> > > superblock, will be ignored
> > > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error count since last
> > > > fsck:
> > > > > > 9459
> > > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial error at time
> > > > > > 1540294049:
> > > > > > > > >> > > ext4_validate_inode_bitmap:134
> > > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last error at time
> > > > > 1596881526:
> > > > > > > > >> > > ext4_free_inode:383
> > > > > > > > >> > >
> > > > > > > > >> > > It looks like some fsck work (and replacing the volume,
> > if
> > > > it
> > > > > > fails)
> > > > > > > > >> > > are required,
> > > > > > > > >> > > but I'm not sure if I could run something like `e2fsck
> > > -p`,
> > > > > > because
> > > > > > > > >> > > I'm also not sure
> > > > > > > > >> > > where does that machine exist or who's managing it.
> > > > > > > > >> > > (I slightly thought it was running as a VM with QEMU on
> > > some
> > > > > EC2
> > > > > > > > >> > > instance, but I couldn't find it)
> > > > > > > > >> > >
> > > > > > > > >> > > > Cos, Evans, Olaf
> > > > > > > > >> > > Would you provide any suggestions?
> > > > > > > > >> > >
> > > > > > > > >> > > Kengo Seki <se...@apache.org>
> > > > > > > > >> > >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: PPC CI server failure

Posted by Evans Ye <ev...@apache.org>.
Thank you, Amir.

MrAsanjar <af...@gmail.com> 於 2020年11月18日 週三 00:39 寫道:

> Hi Evans, let me check with IBM again.
>
>
> On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <ev...@apache.org> wrote:
>
> > Hi Amir,
> >
> > We're planning Bigtop 1.5 release and if we don't have the CI nodes for
> > PPC, we're not able to release 1.5 with PPC supported.
> > Could you help to confirm again? Thanks!
> >
> > Best,
> > Evans Ye
> >
> >
> >
> > MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
> >
> > > I have informed IBM management regarding the situation, waiting for a
> > > reply.
> > >
> > > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <ev...@apache.org> wrote:
> > >
> > > > Ok. Thanks for doing this to get the ball rolling.
> > > >
> > > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
> > > >
> > > > > Thank you for your help, Amir!
> > > > > It's just a heads-up, I temporarily disabled builds for ppc in the
> > > > > following Jenkins jobs so that they can finish.
> > > > >
> > > > > * Docker-Puppet-Trunk
> > > > > * Docker-Puppet-Trunk-pull
> > > > > * Docker-Toolchain-Trunk
> > > > > * Docker-Toolchain-Trunk-pull
> > > > >
> > > > > * Bigtop-trunk-packages
> > > > > * Bigtop-trunk-repos
> > > > >
> > > > > * Remove-All-Docker-Containers-Except-Nexus
> > > > > * Remove-Dangling-Docker-Images
> > > > > * Remove-Inactive-Containers
> > > > >
> > > > > Kengo Seki <se...@apache.org>
> > > > >
> > > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <ev...@apache.org>
> wrote:
> > > > > >
> > > > > > Awesome! Nice to hear from you, buddy!
> > > > > >
> > > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
> > > > > >
> > > > > > > Hi Evans,
> > > > > > > Let me see what I can do. Give me 24 hr :)
> > > > > > >
> > > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <ev...@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > > Yes. I think the action is correct. However [2] might be a
> > > > different
> > > > > > > thing
> > > > > > > > for PPC integration in Hadoop.
> > > > > > > >
> > > > > > > > Amir,
> > > > > > > > Could you confirm?
> > > > > > > >
> > > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
> > > > > > > >
> > > > > > > >> Thank you for the advice, Evans!
> > > > > > > >> Let me confirm about "PPC machine owners". According to
> Amir's
> > > > JIRA
> > > > > > > >> issues [1][2] and the powered-by list in the OSU site [3],
> > we're
> > > > > using
> > > > > > > >> a VM hosted by OSU OSL, right?
> > > > > > > >> If it's correct, I'm going to ask them for help via
> > > > > > > >> powerdev-request@osuosl.org.
> > > > > > > >>
> > > > > > > >> [1]:
> > > > > > > >>
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > > > > >> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> > > > > > > >> [3]:
> > > > > > >
> > > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > > > > >>
> > > > > > > >> Kengo Seki <se...@apache.org>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <
> evansye@apache.org>
> > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > I'd suggest to reach out to PPC machine owners. Worst case
> > Is
> > > we
> > > > > can
> > > > > > > >> > temporary  drop the PPC support to move the release
> forward.
> > > > > > > >> >
> > > > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
> > > > > > > >> >
> > > > > > > >> > > Hi everyone,
> > > > > > > >> > >
> > > > > > > >> > > Let me share information about the CI environment.
> > > > > > > >> > > The worker node for ppc64le is currently offlined, so I
> > just
> > > > > killed
> > > > > > > >> all
> > > > > > > >> > > jobs
> > > > > > > >> > > in the queue waiting for it gets back. Its status is as
> > > > follows.
> > > > > > > >> > >
> > > > > > > >> > > - According to the result of `who -b`, that machine
> seems
> > to
> > > > be
> > > > > > > >> rebooted
> > > > > > > >> > >   on 2020-09-11 for some reason (probably unexpectedly).
> > > > > > > >> > >
> > > > > > > >> > > - According to the result of dmesg, the root volume was
> > > > mounted
> > > > > > > >> > >   in read-only mode because of a fsck failure.
> > > > > > > >> > >
> > > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR
> > > because
> > > > > of
> > > > > > > >> > > unprocessed orphan inode list.  Please umount/remount
> > > instead
> > > > > > > >> > >   [   60.714110] cgroup: new mount options do not match
> > the
> > > > > existing
> > > > > > > >> > > superblock, will be ignored
> > > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error count since last
> > > fsck:
> > > > > 9459
> > > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial error at time
> > > > > 1540294049:
> > > > > > > >> > > ext4_validate_inode_bitmap:134
> > > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last error at time
> > > > 1596881526:
> > > > > > > >> > > ext4_free_inode:383
> > > > > > > >> > >
> > > > > > > >> > > It looks like some fsck work (and replacing the volume,
> if
> > > it
> > > > > fails)
> > > > > > > >> > > are required,
> > > > > > > >> > > but I'm not sure if I could run something like `e2fsck
> > -p`,
> > > > > because
> > > > > > > >> > > I'm also not sure
> > > > > > > >> > > where does that machine exist or who's managing it.
> > > > > > > >> > > (I slightly thought it was running as a VM with QEMU on
> > some
> > > > EC2
> > > > > > > >> > > instance, but I couldn't find it)
> > > > > > > >> > >
> > > > > > > >> > > > Cos, Evans, Olaf
> > > > > > > >> > > Would you provide any suggestions?
> > > > > > > >> > >
> > > > > > > >> > > Kengo Seki <se...@apache.org>
> > > > > > > >> > >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: PPC CI server failure

Posted by MrAsanjar <af...@gmail.com>.
Hi Evans, let me check with IBM again.


On Mon, Nov 16, 2020 at 9:08 PM Evans Ye <ev...@apache.org> wrote:

> Hi Amir,
>
> We're planning Bigtop 1.5 release and if we don't have the CI nodes for
> PPC, we're not able to release 1.5 with PPC supported.
> Could you help to confirm again? Thanks!
>
> Best,
> Evans Ye
>
>
>
> MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:
>
> > I have informed IBM management regarding the situation, waiting for a
> > reply.
> >
> > On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <ev...@apache.org> wrote:
> >
> > > Ok. Thanks for doing this to get the ball rolling.
> > >
> > > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
> > >
> > > > Thank you for your help, Amir!
> > > > It's just a heads-up, I temporarily disabled builds for ppc in the
> > > > following Jenkins jobs so that they can finish.
> > > >
> > > > * Docker-Puppet-Trunk
> > > > * Docker-Puppet-Trunk-pull
> > > > * Docker-Toolchain-Trunk
> > > > * Docker-Toolchain-Trunk-pull
> > > >
> > > > * Bigtop-trunk-packages
> > > > * Bigtop-trunk-repos
> > > >
> > > > * Remove-All-Docker-Containers-Except-Nexus
> > > > * Remove-Dangling-Docker-Images
> > > > * Remove-Inactive-Containers
> > > >
> > > > Kengo Seki <se...@apache.org>
> > > >
> > > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <ev...@apache.org> wrote:
> > > > >
> > > > > Awesome! Nice to hear from you, buddy!
> > > > >
> > > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
> > > > >
> > > > > > Hi Evans,
> > > > > > Let me see what I can do. Give me 24 hr :)
> > > > > >
> > > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <ev...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Yes. I think the action is correct. However [2] might be a
> > > different
> > > > > > thing
> > > > > > > for PPC integration in Hadoop.
> > > > > > >
> > > > > > > Amir,
> > > > > > > Could you confirm?
> > > > > > >
> > > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
> > > > > > >
> > > > > > >> Thank you for the advice, Evans!
> > > > > > >> Let me confirm about "PPC machine owners". According to Amir's
> > > JIRA
> > > > > > >> issues [1][2] and the powered-by list in the OSU site [3],
> we're
> > > > using
> > > > > > >> a VM hosted by OSU OSL, right?
> > > > > > >> If it's correct, I'm going to ask them for help via
> > > > > > >> powerdev-request@osuosl.org.
> > > > > > >>
> > > > > > >> [1]:
> > > > > > >>
> > > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > > > >> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> > > > > > >> [3]:
> > > > > >
> > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > > > >>
> > > > > > >> Kengo Seki <se...@apache.org>
> > > > > > >>
> > > > > > >>
> > > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <ev...@apache.org>
> > > > wrote:
> > > > > > >> >
> > > > > > >> > I'd suggest to reach out to PPC machine owners. Worst case
> Is
> > we
> > > > can
> > > > > > >> > temporary  drop the PPC support to move the release forward.
> > > > > > >> >
> > > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
> > > > > > >> >
> > > > > > >> > > Hi everyone,
> > > > > > >> > >
> > > > > > >> > > Let me share information about the CI environment.
> > > > > > >> > > The worker node for ppc64le is currently offlined, so I
> just
> > > > killed
> > > > > > >> all
> > > > > > >> > > jobs
> > > > > > >> > > in the queue waiting for it gets back. Its status is as
> > > follows.
> > > > > > >> > >
> > > > > > >> > > - According to the result of `who -b`, that machine seems
> to
> > > be
> > > > > > >> rebooted
> > > > > > >> > >   on 2020-09-11 for some reason (probably unexpectedly).
> > > > > > >> > >
> > > > > > >> > > - According to the result of dmesg, the root volume was
> > > mounted
> > > > > > >> > >   in read-only mode because of a fsck failure.
> > > > > > >> > >
> > > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR
> > because
> > > > of
> > > > > > >> > > unprocessed orphan inode list.  Please umount/remount
> > instead
> > > > > > >> > >   [   60.714110] cgroup: new mount options do not match
> the
> > > > existing
> > > > > > >> > > superblock, will be ignored
> > > > > > >> > >   [  316.385805] EXT4-fs (vda1): error count since last
> > fsck:
> > > > 9459
> > > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial error at time
> > > > 1540294049:
> > > > > > >> > > ext4_validate_inode_bitmap:134
> > > > > > >> > >   [  316.385826] EXT4-fs (vda1): last error at time
> > > 1596881526:
> > > > > > >> > > ext4_free_inode:383
> > > > > > >> > >
> > > > > > >> > > It looks like some fsck work (and replacing the volume, if
> > it
> > > > fails)
> > > > > > >> > > are required,
> > > > > > >> > > but I'm not sure if I could run something like `e2fsck
> -p`,
> > > > because
> > > > > > >> > > I'm also not sure
> > > > > > >> > > where does that machine exist or who's managing it.
> > > > > > >> > > (I slightly thought it was running as a VM with QEMU on
> some
> > > EC2
> > > > > > >> > > instance, but I couldn't find it)
> > > > > > >> > >
> > > > > > >> > > > Cos, Evans, Olaf
> > > > > > >> > > Would you provide any suggestions?
> > > > > > >> > >
> > > > > > >> > > Kengo Seki <se...@apache.org>
> > > > > > >> > >
> > > > > > >>
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: PPC CI server failure

Posted by Evans Ye <ev...@apache.org>.
Hi Amir,

We're planning Bigtop 1.5 release and if we don't have the CI nodes for
PPC, we're not able to release 1.5 with PPC supported.
Could you help to confirm again? Thanks!

Best,
Evans Ye



MrAsanjar <af...@gmail.com> 於 2020年9月17日 週四 下午8:56寫道:

> I have informed IBM management regarding the situation, waiting for a
> reply.
>
> On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <ev...@apache.org> wrote:
>
> > Ok. Thanks for doing this to get the ball rolling.
> >
> > Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
> >
> > > Thank you for your help, Amir!
> > > It's just a heads-up, I temporarily disabled builds for ppc in the
> > > following Jenkins jobs so that they can finish.
> > >
> > > * Docker-Puppet-Trunk
> > > * Docker-Puppet-Trunk-pull
> > > * Docker-Toolchain-Trunk
> > > * Docker-Toolchain-Trunk-pull
> > >
> > > * Bigtop-trunk-packages
> > > * Bigtop-trunk-repos
> > >
> > > * Remove-All-Docker-Containers-Except-Nexus
> > > * Remove-Dangling-Docker-Images
> > > * Remove-Inactive-Containers
> > >
> > > Kengo Seki <se...@apache.org>
> > >
> > > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <ev...@apache.org> wrote:
> > > >
> > > > Awesome! Nice to hear from you, buddy!
> > > >
> > > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
> > > >
> > > > > Hi Evans,
> > > > > Let me see what I can do. Give me 24 hr :)
> > > > >
> > > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <ev...@apache.org>
> > wrote:
> > > > >
> > > > > > Yes. I think the action is correct. However [2] might be a
> > different
> > > > > thing
> > > > > > for PPC integration in Hadoop.
> > > > > >
> > > > > > Amir,
> > > > > > Could you confirm?
> > > > > >
> > > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
> > > > > >
> > > > > >> Thank you for the advice, Evans!
> > > > > >> Let me confirm about "PPC machine owners". According to Amir's
> > JIRA
> > > > > >> issues [1][2] and the powered-by list in the OSU site [3], we're
> > > using
> > > > > >> a VM hosted by OSU OSL, right?
> > > > > >> If it's correct, I'm going to ask them for help via
> > > > > >> powerdev-request@osuosl.org.
> > > > > >>
> > > > > >> [1]:
> > > > > >>
> > > > >
> > >
> >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > > >> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> > > > > >> [3]:
> > > > >
> https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > > >>
> > > > > >> Kengo Seki <se...@apache.org>
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <ev...@apache.org>
> > > wrote:
> > > > > >> >
> > > > > >> > I'd suggest to reach out to PPC machine owners. Worst case Is
> we
> > > can
> > > > > >> > temporary  drop the PPC support to move the release forward.
> > > > > >> >
> > > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
> > > > > >> >
> > > > > >> > > Hi everyone,
> > > > > >> > >
> > > > > >> > > Let me share information about the CI environment.
> > > > > >> > > The worker node for ppc64le is currently offlined, so I just
> > > killed
> > > > > >> all
> > > > > >> > > jobs
> > > > > >> > > in the queue waiting for it gets back. Its status is as
> > follows.
> > > > > >> > >
> > > > > >> > > - According to the result of `who -b`, that machine seems to
> > be
> > > > > >> rebooted
> > > > > >> > >   on 2020-09-11 for some reason (probably unexpectedly).
> > > > > >> > >
> > > > > >> > > - According to the result of dmesg, the root volume was
> > mounted
> > > > > >> > >   in read-only mode because of a fsck failure.
> > > > > >> > >
> > > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR
> because
> > > of
> > > > > >> > > unprocessed orphan inode list.  Please umount/remount
> instead
> > > > > >> > >   [   60.714110] cgroup: new mount options do not match the
> > > existing
> > > > > >> > > superblock, will be ignored
> > > > > >> > >   [  316.385805] EXT4-fs (vda1): error count since last
> fsck:
> > > 9459
> > > > > >> > >   [  316.385824] EXT4-fs (vda1): initial error at time
> > > 1540294049:
> > > > > >> > > ext4_validate_inode_bitmap:134
> > > > > >> > >   [  316.385826] EXT4-fs (vda1): last error at time
> > 1596881526:
> > > > > >> > > ext4_free_inode:383
> > > > > >> > >
> > > > > >> > > It looks like some fsck work (and replacing the volume, if
> it
> > > fails)
> > > > > >> > > are required,
> > > > > >> > > but I'm not sure if I could run something like `e2fsck -p`,
> > > because
> > > > > >> > > I'm also not sure
> > > > > >> > > where does that machine exist or who's managing it.
> > > > > >> > > (I slightly thought it was running as a VM with QEMU on some
> > EC2
> > > > > >> > > instance, but I couldn't find it)
> > > > > >> > >
> > > > > >> > > > Cos, Evans, Olaf
> > > > > >> > > Would you provide any suggestions?
> > > > > >> > >
> > > > > >> > > Kengo Seki <se...@apache.org>
> > > > > >> > >
> > > > > >>
> > > > > >
> > > > >
> > >
> >
>

Re: PPC CI server failure

Posted by MrAsanjar <af...@gmail.com>.
I have informed IBM management regarding the situation, waiting for a reply.

On Thu, Sep 17, 2020 at 3:47 AM Evans Ye <ev...@apache.org> wrote:

> Ok. Thanks for doing this to get the ball rolling.
>
> Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:
>
> > Thank you for your help, Amir!
> > It's just a heads-up, I temporarily disabled builds for ppc in the
> > following Jenkins jobs so that they can finish.
> >
> > * Docker-Puppet-Trunk
> > * Docker-Puppet-Trunk-pull
> > * Docker-Toolchain-Trunk
> > * Docker-Toolchain-Trunk-pull
> >
> > * Bigtop-trunk-packages
> > * Bigtop-trunk-repos
> >
> > * Remove-All-Docker-Containers-Except-Nexus
> > * Remove-Dangling-Docker-Images
> > * Remove-Inactive-Containers
> >
> > Kengo Seki <se...@apache.org>
> >
> > On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <ev...@apache.org> wrote:
> > >
> > > Awesome! Nice to hear from you, buddy!
> > >
> > > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
> > >
> > > > Hi Evans,
> > > > Let me see what I can do. Give me 24 hr :)
> > > >
> > > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <ev...@apache.org>
> wrote:
> > > >
> > > > > Yes. I think the action is correct. However [2] might be a
> different
> > > > thing
> > > > > for PPC integration in Hadoop.
> > > > >
> > > > > Amir,
> > > > > Could you confirm?
> > > > >
> > > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
> > > > >
> > > > >> Thank you for the advice, Evans!
> > > > >> Let me confirm about "PPC machine owners". According to Amir's
> JIRA
> > > > >> issues [1][2] and the powered-by list in the OSU site [3], we're
> > using
> > > > >> a VM hosted by OSU OSL, right?
> > > > >> If it's correct, I'm going to ask them for help via
> > > > >> powerdev-request@osuosl.org.
> > > > >>
> > > > >> [1]:
> > > > >>
> > > >
> >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > > >> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> > > > >> [3]:
> > > > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > > >>
> > > > >> Kengo Seki <se...@apache.org>
> > > > >>
> > > > >>
> > > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <ev...@apache.org>
> > wrote:
> > > > >> >
> > > > >> > I'd suggest to reach out to PPC machine owners. Worst case Is we
> > can
> > > > >> > temporary  drop the PPC support to move the release forward.
> > > > >> >
> > > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
> > > > >> >
> > > > >> > > Hi everyone,
> > > > >> > >
> > > > >> > > Let me share information about the CI environment.
> > > > >> > > The worker node for ppc64le is currently offlined, so I just
> > killed
> > > > >> all
> > > > >> > > jobs
> > > > >> > > in the queue waiting for it gets back. Its status is as
> follows.
> > > > >> > >
> > > > >> > > - According to the result of `who -b`, that machine seems to
> be
> > > > >> rebooted
> > > > >> > >   on 2020-09-11 for some reason (probably unexpectedly).
> > > > >> > >
> > > > >> > > - According to the result of dmesg, the root volume was
> mounted
> > > > >> > >   in read-only mode because of a fsck failure.
> > > > >> > >
> > > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR because
> > of
> > > > >> > > unprocessed orphan inode list.  Please umount/remount instead
> > > > >> > >   [   60.714110] cgroup: new mount options do not match the
> > existing
> > > > >> > > superblock, will be ignored
> > > > >> > >   [  316.385805] EXT4-fs (vda1): error count since last fsck:
> > 9459
> > > > >> > >   [  316.385824] EXT4-fs (vda1): initial error at time
> > 1540294049:
> > > > >> > > ext4_validate_inode_bitmap:134
> > > > >> > >   [  316.385826] EXT4-fs (vda1): last error at time
> 1596881526:
> > > > >> > > ext4_free_inode:383
> > > > >> > >
> > > > >> > > It looks like some fsck work (and replacing the volume, if it
> > fails)
> > > > >> > > are required,
> > > > >> > > but I'm not sure if I could run something like `e2fsck -p`,
> > because
> > > > >> > > I'm also not sure
> > > > >> > > where does that machine exist or who's managing it.
> > > > >> > > (I slightly thought it was running as a VM with QEMU on some
> EC2
> > > > >> > > instance, but I couldn't find it)
> > > > >> > >
> > > > >> > > > Cos, Evans, Olaf
> > > > >> > > Would you provide any suggestions?
> > > > >> > >
> > > > >> > > Kengo Seki <se...@apache.org>
> > > > >> > >
> > > > >>
> > > > >
> > > >
> >
>

Re: PPC CI server failure

Posted by Evans Ye <ev...@apache.org>.
Ok. Thanks for doing this to get the ball rolling.

Kengo Seki <se...@apache.org> 於 2020年9月17日 週四 10:29 寫道:

> Thank you for your help, Amir!
> It's just a heads-up, I temporarily disabled builds for ppc in the
> following Jenkins jobs so that they can finish.
>
> * Docker-Puppet-Trunk
> * Docker-Puppet-Trunk-pull
> * Docker-Toolchain-Trunk
> * Docker-Toolchain-Trunk-pull
>
> * Bigtop-trunk-packages
> * Bigtop-trunk-repos
>
> * Remove-All-Docker-Containers-Except-Nexus
> * Remove-Dangling-Docker-Images
> * Remove-Inactive-Containers
>
> Kengo Seki <se...@apache.org>
>
> On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <ev...@apache.org> wrote:
> >
> > Awesome! Nice to hear from you, buddy!
> >
> > MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
> >
> > > Hi Evans,
> > > Let me see what I can do. Give me 24 hr :)
> > >
> > > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <ev...@apache.org> wrote:
> > >
> > > > Yes. I think the action is correct. However [2] might be a different
> > > thing
> > > > for PPC integration in Hadoop.
> > > >
> > > > Amir,
> > > > Could you confirm?
> > > >
> > > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
> > > >
> > > >> Thank you for the advice, Evans!
> > > >> Let me confirm about "PPC machine owners". According to Amir's JIRA
> > > >> issues [1][2] and the powered-by list in the OSU site [3], we're
> using
> > > >> a VM hosted by OSU OSL, right?
> > > >> If it's correct, I'm going to ask them for help via
> > > >> powerdev-request@osuosl.org.
> > > >>
> > > >> [1]:
> > > >>
> > >
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > > >> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> > > >> [3]:
> > > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > > >>
> > > >> Kengo Seki <se...@apache.org>
> > > >>
> > > >>
> > > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <ev...@apache.org>
> wrote:
> > > >> >
> > > >> > I'd suggest to reach out to PPC machine owners. Worst case Is we
> can
> > > >> > temporary  drop the PPC support to move the release forward.
> > > >> >
> > > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
> > > >> >
> > > >> > > Hi everyone,
> > > >> > >
> > > >> > > Let me share information about the CI environment.
> > > >> > > The worker node for ppc64le is currently offlined, so I just
> killed
> > > >> all
> > > >> > > jobs
> > > >> > > in the queue waiting for it gets back. Its status is as follows.
> > > >> > >
> > > >> > > - According to the result of `who -b`, that machine seems to be
> > > >> rebooted
> > > >> > >   on 2020-09-11 for some reason (probably unexpectedly).
> > > >> > >
> > > >> > > - According to the result of dmesg, the root volume was mounted
> > > >> > >   in read-only mode because of a fsck failure.
> > > >> > >
> > > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR because
> of
> > > >> > > unprocessed orphan inode list.  Please umount/remount instead
> > > >> > >   [   60.714110] cgroup: new mount options do not match the
> existing
> > > >> > > superblock, will be ignored
> > > >> > >   [  316.385805] EXT4-fs (vda1): error count since last fsck:
> 9459
> > > >> > >   [  316.385824] EXT4-fs (vda1): initial error at time
> 1540294049:
> > > >> > > ext4_validate_inode_bitmap:134
> > > >> > >   [  316.385826] EXT4-fs (vda1): last error at time 1596881526:
> > > >> > > ext4_free_inode:383
> > > >> > >
> > > >> > > It looks like some fsck work (and replacing the volume, if it
> fails)
> > > >> > > are required,
> > > >> > > but I'm not sure if I could run something like `e2fsck -p`,
> because
> > > >> > > I'm also not sure
> > > >> > > where does that machine exist or who's managing it.
> > > >> > > (I slightly thought it was running as a VM with QEMU on some EC2
> > > >> > > instance, but I couldn't find it)
> > > >> > >
> > > >> > > > Cos, Evans, Olaf
> > > >> > > Would you provide any suggestions?
> > > >> > >
> > > >> > > Kengo Seki <se...@apache.org>
> > > >> > >
> > > >>
> > > >
> > >
>

Re: PPC CI server failure

Posted by Kengo Seki <se...@apache.org>.
Thank you for your help, Amir!
It's just a heads-up, I temporarily disabled builds for ppc in the
following Jenkins jobs so that they can finish.

* Docker-Puppet-Trunk
* Docker-Puppet-Trunk-pull
* Docker-Toolchain-Trunk
* Docker-Toolchain-Trunk-pull

* Bigtop-trunk-packages
* Bigtop-trunk-repos

* Remove-All-Docker-Containers-Except-Nexus
* Remove-Dangling-Docker-Images
* Remove-Inactive-Containers

Kengo Seki <se...@apache.org>

On Wed, Sep 16, 2020 at 7:35 PM Evans Ye <ev...@apache.org> wrote:
>
> Awesome! Nice to hear from you, buddy!
>
> MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:
>
> > Hi Evans,
> > Let me see what I can do. Give me 24 hr :)
> >
> > On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <ev...@apache.org> wrote:
> >
> > > Yes. I think the action is correct. However [2] might be a different
> > thing
> > > for PPC integration in Hadoop.
> > >
> > > Amir,
> > > Could you confirm?
> > >
> > > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
> > >
> > >> Thank you for the advice, Evans!
> > >> Let me confirm about "PPC machine owners". According to Amir's JIRA
> > >> issues [1][2] and the powered-by list in the OSU site [3], we're using
> > >> a VM hosted by OSU OSL, right?
> > >> If it's correct, I'm going to ask them for help via
> > >> powerdev-request@osuosl.org.
> > >>
> > >> [1]:
> > >>
> > https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> > >> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> > >> [3]:
> > https://osuosl.org/services/powerdev/current-projects/#foss-projects
> > >>
> > >> Kengo Seki <se...@apache.org>
> > >>
> > >>
> > >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <ev...@apache.org> wrote:
> > >> >
> > >> > I'd suggest to reach out to PPC machine owners. Worst case Is we can
> > >> > temporary  drop the PPC support to move the release forward.
> > >> >
> > >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
> > >> >
> > >> > > Hi everyone,
> > >> > >
> > >> > > Let me share information about the CI environment.
> > >> > > The worker node for ppc64le is currently offlined, so I just killed
> > >> all
> > >> > > jobs
> > >> > > in the queue waiting for it gets back. Its status is as follows.
> > >> > >
> > >> > > - According to the result of `who -b`, that machine seems to be
> > >> rebooted
> > >> > >   on 2020-09-11 for some reason (probably unexpectedly).
> > >> > >
> > >> > > - According to the result of dmesg, the root volume was mounted
> > >> > >   in read-only mode because of a fsck failure.
> > >> > >
> > >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR because of
> > >> > > unprocessed orphan inode list.  Please umount/remount instead
> > >> > >   [   60.714110] cgroup: new mount options do not match the existing
> > >> > > superblock, will be ignored
> > >> > >   [  316.385805] EXT4-fs (vda1): error count since last fsck: 9459
> > >> > >   [  316.385824] EXT4-fs (vda1): initial error at time 1540294049:
> > >> > > ext4_validate_inode_bitmap:134
> > >> > >   [  316.385826] EXT4-fs (vda1): last error at time 1596881526:
> > >> > > ext4_free_inode:383
> > >> > >
> > >> > > It looks like some fsck work (and replacing the volume, if it fails)
> > >> > > are required,
> > >> > > but I'm not sure if I could run something like `e2fsck -p`, because
> > >> > > I'm also not sure
> > >> > > where does that machine exist or who's managing it.
> > >> > > (I slightly thought it was running as a VM with QEMU on some EC2
> > >> > > instance, but I couldn't find it)
> > >> > >
> > >> > > > Cos, Evans, Olaf
> > >> > > Would you provide any suggestions?
> > >> > >
> > >> > > Kengo Seki <se...@apache.org>
> > >> > >
> > >>
> > >
> >

Re: PPC CI server failure

Posted by Evans Ye <ev...@apache.org>.
Awesome! Nice to hear from you, buddy!

MrAsanjar <af...@gmail.com> 於 2020年9月16日 週三 上午3:54寫道:

> Hi Evans,
> Let me see what I can do. Give me 24 hr :)
>
> On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <ev...@apache.org> wrote:
>
> > Yes. I think the action is correct. However [2] might be a different
> thing
> > for PPC integration in Hadoop.
> >
> > Amir,
> > Could you confirm?
> >
> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
> >
> >> Thank you for the advice, Evans!
> >> Let me confirm about "PPC machine owners". According to Amir's JIRA
> >> issues [1][2] and the powered-by list in the OSU site [3], we're using
> >> a VM hosted by OSU OSL, right?
> >> If it's correct, I'm going to ask them for help via
> >> powerdev-request@osuosl.org.
> >>
> >> [1]:
> >>
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> >> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> >> [3]:
> https://osuosl.org/services/powerdev/current-projects/#foss-projects
> >>
> >> Kengo Seki <se...@apache.org>
> >>
> >>
> >> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <ev...@apache.org> wrote:
> >> >
> >> > I'd suggest to reach out to PPC machine owners. Worst case Is we can
> >> > temporary  drop the PPC support to move the release forward.
> >> >
> >> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
> >> >
> >> > > Hi everyone,
> >> > >
> >> > > Let me share information about the CI environment.
> >> > > The worker node for ppc64le is currently offlined, so I just killed
> >> all
> >> > > jobs
> >> > > in the queue waiting for it gets back. Its status is as follows.
> >> > >
> >> > > - According to the result of `who -b`, that machine seems to be
> >> rebooted
> >> > >   on 2020-09-11 for some reason (probably unexpectedly).
> >> > >
> >> > > - According to the result of dmesg, the root volume was mounted
> >> > >   in read-only mode because of a fsck failure.
> >> > >
> >> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR because of
> >> > > unprocessed orphan inode list.  Please umount/remount instead
> >> > >   [   60.714110] cgroup: new mount options do not match the existing
> >> > > superblock, will be ignored
> >> > >   [  316.385805] EXT4-fs (vda1): error count since last fsck: 9459
> >> > >   [  316.385824] EXT4-fs (vda1): initial error at time 1540294049:
> >> > > ext4_validate_inode_bitmap:134
> >> > >   [  316.385826] EXT4-fs (vda1): last error at time 1596881526:
> >> > > ext4_free_inode:383
> >> > >
> >> > > It looks like some fsck work (and replacing the volume, if it fails)
> >> > > are required,
> >> > > but I'm not sure if I could run something like `e2fsck -p`, because
> >> > > I'm also not sure
> >> > > where does that machine exist or who's managing it.
> >> > > (I slightly thought it was running as a VM with QEMU on some EC2
> >> > > instance, but I couldn't find it)
> >> > >
> >> > > > Cos, Evans, Olaf
> >> > > Would you provide any suggestions?
> >> > >
> >> > > Kengo Seki <se...@apache.org>
> >> > >
> >>
> >
>

Re: PPC CI server failure

Posted by MrAsanjar <af...@gmail.com>.
Hi Evans,
Let me see what I can do. Give me 24 hr :)

On Tue, Sep 15, 2020 at 10:51 AM Evans Ye <ev...@apache.org> wrote:

> Yes. I think the action is correct. However [2] might be a different thing
> for PPC integration in Hadoop.
>
> Amir,
> Could you confirm?
>
> Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:
>
>> Thank you for the advice, Evans!
>> Let me confirm about "PPC machine owners". According to Amir's JIRA
>> issues [1][2] and the powered-by list in the OSU site [3], we're using
>> a VM hosted by OSU OSL, right?
>> If it's correct, I'm going to ask them for help via
>> powerdev-request@osuosl.org.
>>
>> [1]:
>> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
>> [2]: https://issues.apache.org/jira/browse/INFRA-12014
>> [3]: https://osuosl.org/services/powerdev/current-projects/#foss-projects
>>
>> Kengo Seki <se...@apache.org>
>>
>>
>> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <ev...@apache.org> wrote:
>> >
>> > I'd suggest to reach out to PPC machine owners. Worst case Is we can
>> > temporary  drop the PPC support to move the release forward.
>> >
>> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
>> >
>> > > Hi everyone,
>> > >
>> > > Let me share information about the CI environment.
>> > > The worker node for ppc64le is currently offlined, so I just killed
>> all
>> > > jobs
>> > > in the queue waiting for it gets back. Its status is as follows.
>> > >
>> > > - According to the result of `who -b`, that machine seems to be
>> rebooted
>> > >   on 2020-09-11 for some reason (probably unexpectedly).
>> > >
>> > > - According to the result of dmesg, the root volume was mounted
>> > >   in read-only mode because of a fsck failure.
>> > >
>> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR because of
>> > > unprocessed orphan inode list.  Please umount/remount instead
>> > >   [   60.714110] cgroup: new mount options do not match the existing
>> > > superblock, will be ignored
>> > >   [  316.385805] EXT4-fs (vda1): error count since last fsck: 9459
>> > >   [  316.385824] EXT4-fs (vda1): initial error at time 1540294049:
>> > > ext4_validate_inode_bitmap:134
>> > >   [  316.385826] EXT4-fs (vda1): last error at time 1596881526:
>> > > ext4_free_inode:383
>> > >
>> > > It looks like some fsck work (and replacing the volume, if it fails)
>> > > are required,
>> > > but I'm not sure if I could run something like `e2fsck -p`, because
>> > > I'm also not sure
>> > > where does that machine exist or who's managing it.
>> > > (I slightly thought it was running as a VM with QEMU on some EC2
>> > > instance, but I couldn't find it)
>> > >
>> > > > Cos, Evans, Olaf
>> > > Would you provide any suggestions?
>> > >
>> > > Kengo Seki <se...@apache.org>
>> > >
>>
>

Re: PPC CI server failure

Posted by Evans Ye <ev...@apache.org>.
Yes. I think the action is correct. However [2] might be a different thing
for PPC integration in Hadoop.

Amir,
Could you confirm?

Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 下午9:56寫道:

> Thank you for the advice, Evans!
> Let me confirm about "PPC machine owners". According to Amir's JIRA
> issues [1][2] and the powered-by list in the OSU site [3], we're using
> a VM hosted by OSU OSL, right?
> If it's correct, I'm going to ask them for help via
> powerdev-request@osuosl.org.
>
> [1]:
> https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
> [2]: https://issues.apache.org/jira/browse/INFRA-12014
> [3]: https://osuosl.org/services/powerdev/current-projects/#foss-projects
>
> Kengo Seki <se...@apache.org>
>
>
> On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <ev...@apache.org> wrote:
> >
> > I'd suggest to reach out to PPC machine owners. Worst case Is we can
> > temporary  drop the PPC support to move the release forward.
> >
> > Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
> >
> > > Hi everyone,
> > >
> > > Let me share information about the CI environment.
> > > The worker node for ppc64le is currently offlined, so I just killed all
> > > jobs
> > > in the queue waiting for it gets back. Its status is as follows.
> > >
> > > - According to the result of `who -b`, that machine seems to be
> rebooted
> > >   on 2020-09-11 for some reason (probably unexpectedly).
> > >
> > > - According to the result of dmesg, the root volume was mounted
> > >   in read-only mode because of a fsck failure.
> > >
> > >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR because of
> > > unprocessed orphan inode list.  Please umount/remount instead
> > >   [   60.714110] cgroup: new mount options do not match the existing
> > > superblock, will be ignored
> > >   [  316.385805] EXT4-fs (vda1): error count since last fsck: 9459
> > >   [  316.385824] EXT4-fs (vda1): initial error at time 1540294049:
> > > ext4_validate_inode_bitmap:134
> > >   [  316.385826] EXT4-fs (vda1): last error at time 1596881526:
> > > ext4_free_inode:383
> > >
> > > It looks like some fsck work (and replacing the volume, if it fails)
> > > are required,
> > > but I'm not sure if I could run something like `e2fsck -p`, because
> > > I'm also not sure
> > > where does that machine exist or who's managing it.
> > > (I slightly thought it was running as a VM with QEMU on some EC2
> > > instance, but I couldn't find it)
> > >
> > > > Cos, Evans, Olaf
> > > Would you provide any suggestions?
> > >
> > > Kengo Seki <se...@apache.org>
> > >
>

Re: PPC CI server failure

Posted by Kengo Seki <se...@apache.org>.
Thank you for the advice, Evans!
Let me confirm about "PPC machine owners". According to Amir's JIRA
issues [1][2] and the powered-by list in the OSU site [3], we're using
a VM hosted by OSU OSL, right?
If it's correct, I'm going to ask them for help via powerdev-request@osuosl.org.

[1]: https://issues.apache.org/jira/browse/INFRA-11467?focusedCommentId=15300982&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15300982
[2]: https://issues.apache.org/jira/browse/INFRA-12014
[3]: https://osuosl.org/services/powerdev/current-projects/#foss-projects

Kengo Seki <se...@apache.org>


On Mon, Sep 14, 2020 at 2:06 PM Evans Ye <ev...@apache.org> wrote:
>
> I'd suggest to reach out to PPC machine owners. Worst case Is we can
> temporary  drop the PPC support to move the release forward.
>
> Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:
>
> > Hi everyone,
> >
> > Let me share information about the CI environment.
> > The worker node for ppc64le is currently offlined, so I just killed all
> > jobs
> > in the queue waiting for it gets back. Its status is as follows.
> >
> > - According to the result of `who -b`, that machine seems to be rebooted
> >   on 2020-09-11 for some reason (probably unexpectedly).
> >
> > - According to the result of dmesg, the root volume was mounted
> >   in read-only mode because of a fsck failure.
> >
> >   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR because of
> > unprocessed orphan inode list.  Please umount/remount instead
> >   [   60.714110] cgroup: new mount options do not match the existing
> > superblock, will be ignored
> >   [  316.385805] EXT4-fs (vda1): error count since last fsck: 9459
> >   [  316.385824] EXT4-fs (vda1): initial error at time 1540294049:
> > ext4_validate_inode_bitmap:134
> >   [  316.385826] EXT4-fs (vda1): last error at time 1596881526:
> > ext4_free_inode:383
> >
> > It looks like some fsck work (and replacing the volume, if it fails)
> > are required,
> > but I'm not sure if I could run something like `e2fsck -p`, because
> > I'm also not sure
> > where does that machine exist or who's managing it.
> > (I slightly thought it was running as a VM with QEMU on some EC2
> > instance, but I couldn't find it)
> >
> > > Cos, Evans, Olaf
> > Would you provide any suggestions?
> >
> > Kengo Seki <se...@apache.org>
> >

Re: PPC CI server failure

Posted by Evans Ye <ev...@apache.org>.
I'd suggest to reach out to PPC machine owners. Worst case Is we can
temporary  drop the PPC support to move the release forward.

Kengo Seki <se...@apache.org> 於 2020年9月14日 週一 12:44 寫道:

> Hi everyone,
>
> Let me share information about the CI environment.
> The worker node for ppc64le is currently offlined, so I just killed all
> jobs
> in the queue waiting for it gets back. Its status is as follows.
>
> - According to the result of `who -b`, that machine seems to be rebooted
>   on 2020-09-11 for some reason (probably unexpectedly).
>
> - According to the result of dmesg, the root volume was mounted
>   in read-only mode because of a fsck failure.
>
>   [   34.840681] EXT4-fs (vda1): Couldn't remount RDWR because of
> unprocessed orphan inode list.  Please umount/remount instead
>   [   60.714110] cgroup: new mount options do not match the existing
> superblock, will be ignored
>   [  316.385805] EXT4-fs (vda1): error count since last fsck: 9459
>   [  316.385824] EXT4-fs (vda1): initial error at time 1540294049:
> ext4_validate_inode_bitmap:134
>   [  316.385826] EXT4-fs (vda1): last error at time 1596881526:
> ext4_free_inode:383
>
> It looks like some fsck work (and replacing the volume, if it fails)
> are required,
> but I'm not sure if I could run something like `e2fsck -p`, because
> I'm also not sure
> where does that machine exist or who's managing it.
> (I slightly thought it was running as a VM with QEMU on some EC2
> instance, but I couldn't find it)
>
> > Cos, Evans, Olaf
> Would you provide any suggestions?
>
> Kengo Seki <se...@apache.org>
>