You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Sean Busbey <bu...@apache.org> on 2018/08/30 14:35:57 UTC

[DISCUSS] automated checks while asf jenkins is down

Hi folks!

As background, the ASF jenkins master (what you see when you go to
builds.apache.org) went down Tuesday night[1]. ASF Infra is dutifully
working to restore it to full capabilities without losing information
on prior builds. Their most recent estimate has a return to normal
sometime tonight or early tomorrow morning.

I have a few patches I'd like to still see precommit results for
before they get signed off on. I was going to set up checking them
with test-patch myself, but it would only be marginally more work for
me to automate it.

What do folks think about me doing this as a stop-gap while ASF Jenkins is down?

It would be on transient hosts that essentially only I had access to
(and I guess my employer?). So we'd get results posted to JIRA but
getting to actual logs would be a bit more of a pain.

Does anyone feel strongly about me using the existing QABot JIRA
credentials vs making a "I'm a placeholder QABot" account with its own
credentials?

[1]: https://status.apache.org/incidents/4zl6mkyrg8qt

Re: [DISCUSS] automated checks while asf jenkins is down

Posted by Sean Busbey <bu...@apache.org>.
So far looks like a few nodes that don't work, so I've taken them
offline temporarily. INFRA-16974 is tracking the issue.

I think each find-flaky job has now had a successful run after the restore.
On Fri, Aug 31, 2018 at 9:49 AM Sean Busbey <bu...@apache.org> wrote:
>
> looks like infra, given that all but two of the branches work. let me
> try taking H10 offline and see if it works on a different host.
> On Fri, Aug 31, 2018 at 8:05 AM 张铎(Duo Zhang) <pa...@gmail.com> wrote:
> >
> > Our flaky test jobs are not in a good status after the recovery.
> >
> > Error when executing always post condition:
> > org.jenkinsci.plugins.workflow.steps.MissingContextVariableException:
> > Required context class hudson.FilePath is missing
> >
> >
> > And this
> >
> > Caused: java.io.IOException: Remote call on H10 failed
> >
> >
> > Not sure this is a infra issue or a problem of our scripts...
> >
> > Sean Busbey <bu...@apache.org> 于2018年8月31日周五 下午1:19写道:
> >
> > > builds it back. precommit should be working through the backlog of
> > > changes while things were down.
> > > On Thu, Aug 30, 2018 at 9:35 AM Sean Busbey <bu...@apache.org> wrote:
> > > >
> > > > Hi folks!
> > > >
> > > > As background, the ASF jenkins master (what you see when you go to
> > > > builds.apache.org) went down Tuesday night[1]. ASF Infra is dutifully
> > > > working to restore it to full capabilities without losing information
> > > > on prior builds. Their most recent estimate has a return to normal
> > > > sometime tonight or early tomorrow morning.
> > > >
> > > > I have a few patches I'd like to still see precommit results for
> > > > before they get signed off on. I was going to set up checking them
> > > > with test-patch myself, but it would only be marginally more work for
> > > > me to automate it.
> > > >
> > > > What do folks think about me doing this as a stop-gap while ASF Jenkins
> > > is down?
> > > >
> > > > It would be on transient hosts that essentially only I had access to
> > > > (and I guess my employer?). So we'd get results posted to JIRA but
> > > > getting to actual logs would be a bit more of a pain.
> > > >
> > > > Does anyone feel strongly about me using the existing QABot JIRA
> > > > credentials vs making a "I'm a placeholder QABot" account with its own
> > > > credentials?
> > > >
> > > > [1]: https://status.apache.org/incidents/4zl6mkyrg8qt
> > >

Re: [DISCUSS] automated checks while asf jenkins is down

Posted by Sean Busbey <bu...@apache.org>.
looks like infra, given that all but two of the branches work. let me
try taking H10 offline and see if it works on a different host.
On Fri, Aug 31, 2018 at 8:05 AM 张铎(Duo Zhang) <pa...@gmail.com> wrote:
>
> Our flaky test jobs are not in a good status after the recovery.
>
> Error when executing always post condition:
> org.jenkinsci.plugins.workflow.steps.MissingContextVariableException:
> Required context class hudson.FilePath is missing
>
>
> And this
>
> Caused: java.io.IOException: Remote call on H10 failed
>
>
> Not sure this is a infra issue or a problem of our scripts...
>
> Sean Busbey <bu...@apache.org> 于2018年8月31日周五 下午1:19写道:
>
> > builds it back. precommit should be working through the backlog of
> > changes while things were down.
> > On Thu, Aug 30, 2018 at 9:35 AM Sean Busbey <bu...@apache.org> wrote:
> > >
> > > Hi folks!
> > >
> > > As background, the ASF jenkins master (what you see when you go to
> > > builds.apache.org) went down Tuesday night[1]. ASF Infra is dutifully
> > > working to restore it to full capabilities without losing information
> > > on prior builds. Their most recent estimate has a return to normal
> > > sometime tonight or early tomorrow morning.
> > >
> > > I have a few patches I'd like to still see precommit results for
> > > before they get signed off on. I was going to set up checking them
> > > with test-patch myself, but it would only be marginally more work for
> > > me to automate it.
> > >
> > > What do folks think about me doing this as a stop-gap while ASF Jenkins
> > is down?
> > >
> > > It would be on transient hosts that essentially only I had access to
> > > (and I guess my employer?). So we'd get results posted to JIRA but
> > > getting to actual logs would be a bit more of a pain.
> > >
> > > Does anyone feel strongly about me using the existing QABot JIRA
> > > credentials vs making a "I'm a placeholder QABot" account with its own
> > > credentials?
> > >
> > > [1]: https://status.apache.org/incidents/4zl6mkyrg8qt
> >

Re: [DISCUSS] automated checks while asf jenkins is down

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Our flaky test jobs are not in a good status after the recovery.

Error when executing always post condition:
org.jenkinsci.plugins.workflow.steps.MissingContextVariableException:
Required context class hudson.FilePath is missing


And this

Caused: java.io.IOException: Remote call on H10 failed


Not sure this is a infra issue or a problem of our scripts...

Sean Busbey <bu...@apache.org> 于2018年8月31日周五 下午1:19写道:

> builds it back. precommit should be working through the backlog of
> changes while things were down.
> On Thu, Aug 30, 2018 at 9:35 AM Sean Busbey <bu...@apache.org> wrote:
> >
> > Hi folks!
> >
> > As background, the ASF jenkins master (what you see when you go to
> > builds.apache.org) went down Tuesday night[1]. ASF Infra is dutifully
> > working to restore it to full capabilities without losing information
> > on prior builds. Their most recent estimate has a return to normal
> > sometime tonight or early tomorrow morning.
> >
> > I have a few patches I'd like to still see precommit results for
> > before they get signed off on. I was going to set up checking them
> > with test-patch myself, but it would only be marginally more work for
> > me to automate it.
> >
> > What do folks think about me doing this as a stop-gap while ASF Jenkins
> is down?
> >
> > It would be on transient hosts that essentially only I had access to
> > (and I guess my employer?). So we'd get results posted to JIRA but
> > getting to actual logs would be a bit more of a pain.
> >
> > Does anyone feel strongly about me using the existing QABot JIRA
> > credentials vs making a "I'm a placeholder QABot" account with its own
> > credentials?
> >
> > [1]: https://status.apache.org/incidents/4zl6mkyrg8qt
>

Re: [DISCUSS] automated checks while asf jenkins is down

Posted by Sean Busbey <bu...@apache.org>.
builds it back. precommit should be working through the backlog of
changes while things were down.
On Thu, Aug 30, 2018 at 9:35 AM Sean Busbey <bu...@apache.org> wrote:
>
> Hi folks!
>
> As background, the ASF jenkins master (what you see when you go to
> builds.apache.org) went down Tuesday night[1]. ASF Infra is dutifully
> working to restore it to full capabilities without losing information
> on prior builds. Their most recent estimate has a return to normal
> sometime tonight or early tomorrow morning.
>
> I have a few patches I'd like to still see precommit results for
> before they get signed off on. I was going to set up checking them
> with test-patch myself, but it would only be marginally more work for
> me to automate it.
>
> What do folks think about me doing this as a stop-gap while ASF Jenkins is down?
>
> It would be on transient hosts that essentially only I had access to
> (and I guess my employer?). So we'd get results posted to JIRA but
> getting to actual logs would be a bit more of a pain.
>
> Does anyone feel strongly about me using the existing QABot JIRA
> credentials vs making a "I'm a placeholder QABot" account with its own
> credentials?
>
> [1]: https://status.apache.org/incidents/4zl6mkyrg8qt

Re: [DISCUSS] automated checks while asf jenkins is down

Posted by Josh Elser <el...@apache.org>.
+1 and thanks if you want to spend the time to get something temporary 
up to unblock QA.

Different credentials would seem like a good thing re: security, but as 
long as we can discern between "official" and "temporary replacement", I 
don't mind much.

Bonus points if you can also document how you did it (since I would 
guess it will be beneficial down the road).

On 8/30/18 11:10 AM, Sean Busbey wrote:
> On Thu, Aug 30, 2018 at 9:44 AM Misty Linville <mi...@apache.org> wrote:
>>
>> I'm concerned that getting the logs will take a lot of your time. Is there
>> a way to have the automation put them somewhere where the patch author can
>> get to them when needed? Also how much time will this take to set up? Will
>> it be worth it? Will this initial setup be useful long-term in any way?
>>
> 
> 
> I don't know of a public-facing storage place I can dump the logs, so
> I don't think automating the logs is feasible short term.
> 
> It'll probably take me, on the outside, an hour longer than getting
> the stuff I'm going to do for myself done. Probably a lot less now
> that the logic of the "pull suitable issues out of JIRA" job is
> something maintained in source control over in the Yetus project.
> 
> I guess long term it'd be useful because I'd have a run guide for
> doing it again if ASF jenkins goes down?
> 

Re: [DISCUSS] automated checks while asf jenkins is down

Posted by Sean Busbey <bu...@apache.org>.
On Thu, Aug 30, 2018 at 9:44 AM Misty Linville <mi...@apache.org> wrote:
>
> I'm concerned that getting the logs will take a lot of your time. Is there
> a way to have the automation put them somewhere where the patch author can
> get to them when needed? Also how much time will this take to set up? Will
> it be worth it? Will this initial setup be useful long-term in any way?
>


I don't know of a public-facing storage place I can dump the logs, so
I don't think automating the logs is feasible short term.

It'll probably take me, on the outside, an hour longer than getting
the stuff I'm going to do for myself done. Probably a lot less now
that the logic of the "pull suitable issues out of JIRA" job is
something maintained in source control over in the Yetus project.

I guess long term it'd be useful because I'd have a run guide for
doing it again if ASF jenkins goes down?

Re: [DISCUSS] automated checks while asf jenkins is down

Posted by Misty Linville <mi...@apache.org>.
I'm concerned that getting the logs will take a lot of your time. Is there
a way to have the automation put them somewhere where the patch author can
get to them when needed? Also how much time will this take to set up? Will
it be worth it? Will this initial setup be useful long-term in any way?

On Thu, Aug 30, 2018, 7:36 AM Sean Busbey <bu...@apache.org> wrote:

> Hi folks!
>
> As background, the ASF jenkins master (what you see when you go to
> builds.apache.org) went down Tuesday night[1]. ASF Infra is dutifully
> working to restore it to full capabilities without losing information
> on prior builds. Their most recent estimate has a return to normal
> sometime tonight or early tomorrow morning.
>
> I have a few patches I'd like to still see precommit results for
> before they get signed off on. I was going to set up checking them
> with test-patch myself, but it would only be marginally more work for
> me to automate it.
>
> What do folks think about me doing this as a stop-gap while ASF Jenkins is
> down?
>
> It would be on transient hosts that essentially only I had access to
> (and I guess my employer?). So we'd get results posted to JIRA but
> getting to actual logs would be a bit more of a pain.
>
> Does anyone feel strongly about me using the existing QABot JIRA
> credentials vs making a "I'm a placeholder QABot" account with its own
> credentials?
>
> [1]: https://status.apache.org/incidents/4zl6mkyrg8qt
>

Re: [DISCUSS] automated checks while asf jenkins is down

Posted by Andrew Purtell <ap...@apache.org>.
> It would be on transient hosts that essentially only I had access to (and
I guess my employer?). So we'd get results posted to JIRA but getting to
actual logs would be a bit more of a pain.

For your consideration, and probably not for this specific instance...
Setting something up in AWS that can be launched from an AMI with keys
shared among project PMC would allow multiple interested parties to chip in
for hosting. Perhaps we could launch instances with such an AMI, and would
automatically contribute to hosting test resources for the project simply
by doing so.


On Thu, Aug 30, 2018 at 7:36 AM Sean Busbey <bu...@apache.org> wrote:

> Hi folks!
>
> As background, the ASF jenkins master (what you see when you go to
> builds.apache.org) went down Tuesday night[1]. ASF Infra is dutifully
> working to restore it to full capabilities without losing information
> on prior builds. Their most recent estimate has a return to normal
> sometime tonight or early tomorrow morning.
>
> I have a few patches I'd like to still see precommit results for
> before they get signed off on. I was going to set up checking them
> with test-patch myself, but it would only be marginally more work for
> me to automate it.
>
> What do folks think about me doing this as a stop-gap while ASF Jenkins is
> down?
>
> It would be on transient hosts that essentially only I had access to
> (and I guess my employer?). So we'd get results posted to JIRA but
> getting to actual logs would be a bit more of a pain.
>
> Does anyone feel strongly about me using the existing QABot JIRA
> credentials vs making a "I'm a placeholder QABot" account with its own
> credentials?
>
> [1]: https://status.apache.org/incidents/4zl6mkyrg8qt
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk