You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by Damjan Jovanovic <da...@apache.org> on 2016/01/29 08:18:49 UTC

Let's fix the Windows build bots

Hi

I am to get our build bots working again.

So far I've committed a patch to fix #126787 (build breaking regression in
main/sal due to <prewin.h> and <postwin.h> being included) in r1727413, and
deleted an unused empty file that the RAT scan was treating as missing the
ASL in r1727463. The Windows aoo-win7 build bot now gets further,
infamously breaking in apr, but now also breaking in jpeg (
https://ci.apache.org/builders/aoo-win7/builds/156/steps/build.pl%20--all/logs/stdio
).

The error for the jpeg module is:
dmake:  Error: -- `./
wntmsci12.pro/misc/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.unpack' not
found, and can't be made

What looks interesting, if you get the stdio log and look for the "Entering
..." lines and sort and count them, is that the only modules that are
entered more than once are the same ones that the build breaks in: apr and
jpeg:

$ grep Entering /tmp/stdio | sort | uniq -c | grep "^      1 " -v
      2 Entering /cygdrive/e/slave14/aoo-win7/build/ext_libraries/apr
      2 Entering /cygdrive/e/slave14/aoo-win7/build/main/jpeg

Is it breaking because build.pl is trying to build those modules more than
once (or is it re-entering them because they broke the build)?

Our build page https://ci.apache.org/projects/openoffice/index.html is also
messed up - eg. "Windows Nightly Build Logs" is listed under "FreeBSD
nightly Install Packages". How do we edit it?

What else? How do we get SSH access to the build bots for further debugging?

Thank you
Damjan

Re: Let's fix the Windows build bots

Posted by Kay Schenk <ka...@gmail.com>.

On 02/04/2016 10:11 AM, Damjan Jovanovic wrote:
> On Thu, Feb 4, 2016 at 7:42 PM, Kay Schenk <ka...@gmail.com> wrote:
> 
>>
>> On 02/04/2016 06:05 AM, Damjan Jovanovic wrote:
>>> On Wed, Feb 3, 2016 at 1:48 PM, Jochen Nitschke <j....@ok.de>
>> wrote:
>>>
>>>> On Wed, 3 Feb 2016 13:15:23 +0200 Damjan Jovanovic wrote:
>>>>> So builds@ replied saying the buildbot was broken by an earlier bad
>>>> commit
>>>>> (not mine) and should be working now, and it was, but I ended up
>> needing
>>>> to
>>>>> commit another patch (cydrive -> cygdrive typo in the path to "svn
>>>> info").
>>>>> That 1 byte patch cannot be wrong, yet the buildbot isn't using it now,
>>>>> despite committing it hours ago.
>>>>>
>>>>> The website changes also haven't taken.
>>>> but hey, its progress :-)
>>>>
>>>> could you follow the flow?
>>>> commit
>>>> wait 5 mins
>>>> pull changes
>>>> buildbot checkconfig  (on your local config files?)
>>>> buildbot reconfig (wouldn't know how)
>>>>
>>>>
>>> After mailing builds@ and infrastructure@, I ran into other IRC users on
>>> #asftest, one of which (pono) helped, restarting the buildmaster. That
>> got
>>> it using the latest commits, which finally fixed "svn info" on aoo-win7
>> :-).
>>>
>>> I then made further commits with the same "svn info" improvement and
>>> bootstrap improvement for aoo-w7snap, and another on aoo-win7 to use "svn
>>> export" instead of rsync to generate the build directory to test the
>> theory
>>> that the use of rsync is what breaks apr later (
>>> https://issues.apache.org/jira/browse/INFRA-10481), but the commits
>> weren't
>>> going through again. Pono investigated, eventually finding the buildbot
>>> configuration was up to date, but the buildbot wasn't using it; by
>>> reloading the config (not sure how) (which caused the ooo-bot to
>>> temporarily disconnect from IRC) it started using them again.
>>>
>>> My aoo-w7snap fixes went through and "svn info" is now working there too,
>>> but my aoo-win7 changes were less successful: my new svn export command
>> was
>>> preceded by "rm -rf build" which fails in the infamous apr module:
>>>
>>> rm: cannot remove `build/ext_libraries/apr/
>>> wntmsci12.pro/misc/build/apr-1.4.5/Makefile.win': Device or resource
>> busy
>>> (
>>>
>> https://ci.apache.org/builders/aoo-win7/builds/173/steps/svn%20export/logs/stdio
>>> )
>>>
>>> "Device or resource busy" is the same error that later breaks building
>> apr,
>>> and they seem related. I asked pono to look into what has that file
>> opened
>>> or locked and am waiting for that and other maintenance on aoo-win7.
>>>
>>
>> OK, great. Yes, this locking/resource busy business continues to be
>> a problem with this buildbot.
>>
>>
> And the aoo-w7snap buildbot.
> 
> 
>> Thank you SO much for trudging on with this. I see you also put a
>> custom remove in for the linux-32 nightly buildbot with its own
>> custom timeout so hopefully we can get past this snag as well.
>>
>>
> Pleasure. Yes, the SVN() buildbot command doesn't apply its timeout to the
> subcommands it runs internally, so its "rm -rf" only gets 120 seconds which
> isn't long enough. The openoffice-linux64-nightly has a similar timeout
> issue with "cp" every now and then, which I am not sure how to fix. Maybe
> we should just do svn checkouts the aoo-win7 way: running "svn co"
> ourselves?

At this point, I would say just try whatever you think would work! :)

> 
> 
>> Did pono have any words of wisdom regarding how to make our commits
>> actually *happen*, or do we need to request a restart of specific
>> buildmaster each time?
>>
> 
> No, but after I committed my openoffice-linux32-nightly change and started
> a rebuild of that buildbot (5 minutes and 18 seconds later), that change
> was being used, so I guess that's working now.
> 

-- 
--------------------------------------------
MzK

"Though no one can go back and make a brand new start,
 anyone can start from now and make a brand new ending."
                            -- Carl Bard

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: Let's fix the Windows build bots

Posted by Damjan Jovanovic <da...@apache.org>.
On Thu, Feb 4, 2016 at 7:42 PM, Kay Schenk <ka...@gmail.com> wrote:

>
> On 02/04/2016 06:05 AM, Damjan Jovanovic wrote:
> > On Wed, Feb 3, 2016 at 1:48 PM, Jochen Nitschke <j....@ok.de>
> wrote:
> >
> >> On Wed, 3 Feb 2016 13:15:23 +0200 Damjan Jovanovic wrote:
> >>> So builds@ replied saying the buildbot was broken by an earlier bad
> >> commit
> >>> (not mine) and should be working now, and it was, but I ended up
> needing
> >> to
> >>> commit another patch (cydrive -> cygdrive typo in the path to "svn
> >> info").
> >>> That 1 byte patch cannot be wrong, yet the buildbot isn't using it now,
> >>> despite committing it hours ago.
> >>>
> >>> The website changes also haven't taken.
> >> but hey, its progress :-)
> >>
> >> could you follow the flow?
> >> commit
> >> wait 5 mins
> >> pull changes
> >> buildbot checkconfig  (on your local config files?)
> >> buildbot reconfig (wouldn't know how)
> >>
> >>
> > After mailing builds@ and infrastructure@, I ran into other IRC users on
> > #asftest, one of which (pono) helped, restarting the buildmaster. That
> got
> > it using the latest commits, which finally fixed "svn info" on aoo-win7
> :-).
> >
> > I then made further commits with the same "svn info" improvement and
> > bootstrap improvement for aoo-w7snap, and another on aoo-win7 to use "svn
> > export" instead of rsync to generate the build directory to test the
> theory
> > that the use of rsync is what breaks apr later (
> > https://issues.apache.org/jira/browse/INFRA-10481), but the commits
> weren't
> > going through again. Pono investigated, eventually finding the buildbot
> > configuration was up to date, but the buildbot wasn't using it; by
> > reloading the config (not sure how) (which caused the ooo-bot to
> > temporarily disconnect from IRC) it started using them again.
> >
> > My aoo-w7snap fixes went through and "svn info" is now working there too,
> > but my aoo-win7 changes were less successful: my new svn export command
> was
> > preceded by "rm -rf build" which fails in the infamous apr module:
> >
> > rm: cannot remove `build/ext_libraries/apr/
> > wntmsci12.pro/misc/build/apr-1.4.5/Makefile.win': Device or resource
> busy
> > (
> >
> https://ci.apache.org/builders/aoo-win7/builds/173/steps/svn%20export/logs/stdio
> > )
> >
> > "Device or resource busy" is the same error that later breaks building
> apr,
> > and they seem related. I asked pono to look into what has that file
> opened
> > or locked and am waiting for that and other maintenance on aoo-win7.
> >
>
> OK, great. Yes, this locking/resource busy business continues to be
> a problem with this buildbot.
>
>
And the aoo-w7snap buildbot.


> Thank you SO much for trudging on with this. I see you also put a
> custom remove in for the linux-32 nightly buildbot with its own
> custom timeout so hopefully we can get past this snag as well.
>
>
Pleasure. Yes, the SVN() buildbot command doesn't apply its timeout to the
subcommands it runs internally, so its "rm -rf" only gets 120 seconds which
isn't long enough. The openoffice-linux64-nightly has a similar timeout
issue with "cp" every now and then, which I am not sure how to fix. Maybe
we should just do svn checkouts the aoo-win7 way: running "svn co"
ourselves?


> Did pono have any words of wisdom regarding how to make our commits
> actually *happen*, or do we need to request a restart of specific
> buildmaster each time?
>

No, but after I committed my openoffice-linux32-nightly change and started
a rebuild of that buildbot (5 minutes and 18 seconds later), that change
was being used, so I guess that's working now.

Re: Let's fix the Windows build bots

Posted by Kay Schenk <ka...@gmail.com>.
On 02/04/2016 06:05 AM, Damjan Jovanovic wrote:
> On Wed, Feb 3, 2016 at 1:48 PM, Jochen Nitschke <j....@ok.de> wrote:
> 
>> On Wed, 3 Feb 2016 13:15:23 +0200 Damjan Jovanovic wrote:
>>> So builds@ replied saying the buildbot was broken by an earlier bad
>> commit
>>> (not mine) and should be working now, and it was, but I ended up needing
>> to
>>> commit another patch (cydrive -> cygdrive typo in the path to "svn
>> info").
>>> That 1 byte patch cannot be wrong, yet the buildbot isn't using it now,
>>> despite committing it hours ago.
>>>
>>> The website changes also haven't taken.
>> but hey, its progress :-)
>>
>> could you follow the flow?
>> commit
>> wait 5 mins
>> pull changes
>> buildbot checkconfig  (on your local config files?)
>> buildbot reconfig (wouldn't know how)
>>
>>
> After mailing builds@ and infrastructure@, I ran into other IRC users on
> #asftest, one of which (pono) helped, restarting the buildmaster. That got
> it using the latest commits, which finally fixed "svn info" on aoo-win7 :-).
> 
> I then made further commits with the same "svn info" improvement and
> bootstrap improvement for aoo-w7snap, and another on aoo-win7 to use "svn
> export" instead of rsync to generate the build directory to test the theory
> that the use of rsync is what breaks apr later (
> https://issues.apache.org/jira/browse/INFRA-10481), but the commits weren't
> going through again. Pono investigated, eventually finding the buildbot
> configuration was up to date, but the buildbot wasn't using it; by
> reloading the config (not sure how) (which caused the ooo-bot to
> temporarily disconnect from IRC) it started using them again.
> 
> My aoo-w7snap fixes went through and "svn info" is now working there too,
> but my aoo-win7 changes were less successful: my new svn export command was
> preceded by "rm -rf build" which fails in the infamous apr module:
> 
> rm: cannot remove `build/ext_libraries/apr/
> wntmsci12.pro/misc/build/apr-1.4.5/Makefile.win': Device or resource busy
> (
> https://ci.apache.org/builders/aoo-win7/builds/173/steps/svn%20export/logs/stdio
> )
> 
> "Device or resource busy" is the same error that later breaks building apr,
> and they seem related. I asked pono to look into what has that file opened
> or locked and am waiting for that and other maintenance on aoo-win7.
> 

OK, great. Yes, this locking/resource busy business continues to be
a problem with this buildbot.

Thank you SO much for trudging on with this. I see you also put a
custom remove in for the linux-32 nightly buildbot with its own
custom timeout so hopefully we can get past this snag as well.

Did pono have any words of wisdom regarding how to make our commits
actually *happen*, or do we need to request a restart of specific
buildmaster each time?

-- 
--------------------------------------------
MzK

"Though no one can go back and make a brand new start,
 anyone can start from now and make a brand new ending."
                            -- Carl Bard

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: Let's fix the Windows build bots

Posted by Damjan Jovanovic <da...@apache.org>.
On Wed, Feb 3, 2016 at 1:48 PM, Jochen Nitschke <j....@ok.de> wrote:

> On Wed, 3 Feb 2016 13:15:23 +0200 Damjan Jovanovic wrote:
> > So builds@ replied saying the buildbot was broken by an earlier bad
> commit
> > (not mine) and should be working now, and it was, but I ended up needing
> to
> > commit another patch (cydrive -> cygdrive typo in the path to "svn
> info").
> > That 1 byte patch cannot be wrong, yet the buildbot isn't using it now,
> > despite committing it hours ago.
> >
> > The website changes also haven't taken.
> but hey, its progress :-)
>
> could you follow the flow?
> commit
> wait 5 mins
> pull changes
> buildbot checkconfig  (on your local config files?)
> buildbot reconfig (wouldn't know how)
>
>
After mailing builds@ and infrastructure@, I ran into other IRC users on
#asftest, one of which (pono) helped, restarting the buildmaster. That got
it using the latest commits, which finally fixed "svn info" on aoo-win7 :-).

I then made further commits with the same "svn info" improvement and
bootstrap improvement for aoo-w7snap, and another on aoo-win7 to use "svn
export" instead of rsync to generate the build directory to test the theory
that the use of rsync is what breaks apr later (
https://issues.apache.org/jira/browse/INFRA-10481), but the commits weren't
going through again. Pono investigated, eventually finding the buildbot
configuration was up to date, but the buildbot wasn't using it; by
reloading the config (not sure how) (which caused the ooo-bot to
temporarily disconnect from IRC) it started using them again.

My aoo-w7snap fixes went through and "svn info" is now working there too,
but my aoo-win7 changes were less successful: my new svn export command was
preceded by "rm -rf build" which fails in the infamous apr module:

rm: cannot remove `build/ext_libraries/apr/
wntmsci12.pro/misc/build/apr-1.4.5/Makefile.win': Device or resource busy
(
https://ci.apache.org/builders/aoo-win7/builds/173/steps/svn%20export/logs/stdio
)

"Device or resource busy" is the same error that later breaks building apr,
and they seem related. I asked pono to look into what has that file opened
or locked and am waiting for that and other maintenance on aoo-win7.

Re: Let's fix the Windows build bots

Posted by Jochen Nitschke <j....@ok.de>.
On Wed, 3 Feb 2016 13:15:23 +0200 Damjan Jovanovic wrote:
> So builds@ replied saying the buildbot was broken by an earlier bad commit
> (not mine) and should be working now, and it was, but I ended up needing to
> commit another patch (cydrive -> cygdrive typo in the path to "svn info").
> That 1 byte patch cannot be wrong, yet the buildbot isn't using it now,
> despite committing it hours ago.
>
> The website changes also haven't taken.
but hey, its progress :-)

could you follow the flow?
commit
wait 5 mins
pull changes
buildbot checkconfig  (on your local config files?)
buildbot reconfig (wouldn't know how)


Re: Let's fix the Windows build bots

Posted by Damjan Jovanovic <da...@apache.org>.
On Tue, Feb 2, 2016 at 5:39 AM, Damjan Jovanovic <da...@apache.org> wrote:

>
>
> On Mon, Feb 1, 2016 at 2:43 AM, j.nitschke@ok.de <j....@ok.de> wrote:
>
>> On Sun, 31 Jan 2016 10:07:33 -0800 Kay Schenk wrote:
>> > On Fri, Jan 29, 2016 at 10:04 PM, Damjan Jovanovic <da...@apache.org>
>> > wrote:
>>
>> >> I think the latter is true: it's re-entering them because they broke
>> the
>> >> build.
>> >>
>> >> Yesterday jpeg failed to download and broke the build, and today nss
>> failed
>> >> to download and also broke the build. Why are downloads often failing
>> >> during ./bootstrap? Can we get a local mirror or cache external
>> >> dependencies somehow? Also, why does ./bootstrap seem to return an exit
>> >> code of 0 when needed dependencies can't be downloaded, and why is
>> >> haltOnFailure=False for the Windows buildbots but True for other
>> platforms?
>> >>
>> > ​No idea about this setting. You might look at the svn logs for our
>> config
>> > to see if there's any information there. Otherwise, just change it if
>> you
>> > feel inclined.
>> Not sure if you are talking about a global setting or just for the
>> 'build.pl --all' step.
>> But a global haltOnFailure would stop the build at 'svn info' step.
>>
>>
> I meant just for the ./bootstrap step.
>
>
>> About the 'svn info' step, this one stopped working between 27.07.2015
>> and 03.08.2015 as one can see in the snapshot builds #7 and #8 which run
>> exact the same command but later fails.
>> https://ci.apache.org/builders/aoo-w7snap/builds/7
>> https://ci.apache.org/builders/aoo-w7snap/builds/8
>>
>>
> The error on
> https://ci.apache.org/builders/aoo-win7/builds/159/steps/svn%20info/logs/stdio
> is:
>
> 'svn' is not recognized as an internal or external command,
> operable program or batch file.
>
> As of r979234 I started running "svn info" under Cygwin like is already
> done for "svn co". Let's see if it helps. Is there some way to request an
> immediate rebuild?
>
>
So builds@ replied saying the buildbot was broken by an earlier bad commit
(not mine) and should be working now, and it was, but I ended up needing to
commit another patch (cydrive -> cygdrive typo in the path to "svn info").
That 1 byte patch cannot be wrong, yet the buildbot isn't using it now,
despite committing it hours ago.

The website changes also haven't taken.

Re: Let's fix the Windows build bots

Posted by Damjan Jovanovic <da...@apache.org>.
On Mon, Feb 1, 2016 at 2:43 AM, j.nitschke@ok.de <j....@ok.de> wrote:

> On Sun, 31 Jan 2016 10:07:33 -0800 Kay Schenk wrote:
> > On Fri, Jan 29, 2016 at 10:04 PM, Damjan Jovanovic <da...@apache.org>
> > wrote:
>
> >> I think the latter is true: it's re-entering them because they broke the
> >> build.
> >>
> >> Yesterday jpeg failed to download and broke the build, and today nss
> failed
> >> to download and also broke the build. Why are downloads often failing
> >> during ./bootstrap? Can we get a local mirror or cache external
> >> dependencies somehow? Also, why does ./bootstrap seem to return an exit
> >> code of 0 when needed dependencies can't be downloaded, and why is
> >> haltOnFailure=False for the Windows buildbots but True for other
> platforms?
> >>
> > ​No idea about this setting. You might look at the svn logs for our
> config
> > to see if there's any information there. Otherwise, just change it if you
> > feel inclined.
> Not sure if you are talking about a global setting or just for the
> 'build.pl --all' step.
> But a global haltOnFailure would stop the build at 'svn info' step.
>
>
I meant just for the ./bootstrap step.


> About the 'svn info' step, this one stopped working between 27.07.2015
> and 03.08.2015 as one can see in the snapshot builds #7 and #8 which run
> exact the same command but later fails.
> https://ci.apache.org/builders/aoo-w7snap/builds/7
> https://ci.apache.org/builders/aoo-w7snap/builds/8
>
>
The error on
https://ci.apache.org/builders/aoo-win7/builds/159/steps/svn%20info/logs/stdio
is:

'svn' is not recognized as an internal or external command,
operable program or batch file.

As of r979234 I started running "svn info" under Cygwin like is already
done for "svn co". Let's see if it helps. Is there some way to request an
immediate rebuild?

Re: Let's fix the Windows build bots

Posted by Damjan Jovanovic <da...@apache.org>.
On Mon, Mar 28, 2016 at 10:53 PM, Andrea Pescetti <pe...@apache.org> wrote:
> On 01/02/2016 j.nitschke wrote:
>>
>> But a global haltOnFailure would stop the build at 'svn info' step.
>> About the 'svn info' step, this one stopped working ...
>
>
> And we are back here, it seems. With the HTTPS fixes now in place, all the
> Linux and FreeBSD buildbots completed their build successfully, while the
> Windows one ran build --all but then suddenly stopped almost at the end,
> after delivering sc
>
> https://ci.apache.org/builders/aoo-win7/builds/231/steps/build.pl%20--all/logs/stdio
> command timed out: 20000 seconds without output, killing pid 13888
>
> and today it stopped at a very early stage that you had already fixed, the
> "svn info":
>
> https://ci.apache.org/builders/aoo-win7/builds/232/steps/svn%20export/logs/stdio
> Inappropriate ioctl for device
>
> Jochen, Damjan, do you remember how you fixed it back at the time? Or was it
> handled in the chat session with pono Damjan mentions later in this thread
> (which would basically mean that it disappeared magically after a restart
> and some magic by Infra)?

No, the build hanging problem was never fixed. It started
mysteriously, and since I never got access to the Windows build bots
despite asking infra for months and even offering to volunteer, I have
no idea what's wrong or any way to debug it or any other build bot
problem. My own Windows box builds AOO perfectly and always has.

Regards
Damjan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: Let's fix the Windows build bots

Posted by Jochen Nitschke <j....@ok.de>.
On Mon, 28 Mar 2016 22:53:43 +0200 Andrea Pescetti wrote:
> And we are back here, it seems. With the HTTPS fixes now in place, all
> the Linux and FreeBSD buildbots completed their build successfully,
> while the Windows one ran build --all but then suddenly stopped almost
> at the end, after delivering sc
>
> https://ci.apache.org/builders/aoo-win7/builds/231/steps/build.pl%20--all/logs/stdio
>
> command timed out: 20000 seconds without output, killing pid 13888
That's worrisome.
Why does did build step run 15 hours?
Why did sc deliver run over 5 hours?
possible reasons just speculations:
* some Antivirus slows down everything or even locks up some files
(could explain the lock up in the following builds)
* Guess sc deliver is the linking step, this takes very much memory.
   Maybe we hit a memory limit on the machine or it starts to use hard disk.
* Maybe we use just one thread ( to avoid fails due race conditions ).
Would explain total time but not long linking time.
* Maybe the assigned resources for the vm are too limited (io, cpu and
memory) for a build.

Guess to solve these 'maybe's one needs online access during a build.
Who wants to watch the build process for 15hours? :-p

> and today it stopped at a very early stage that you had already fixed,
> the "svn info":
>
> https://ci.apache.org/builders/aoo-win7/builds/232/steps/svn%20export/logs/stdio
>
> Inappropriate ioctl for device
>
That is sadly the usual behaviour. Line of interest is:
> rm: cannot remove
`build/ext_libraries/apr/wntmsci12.pro/misc/build/apr-1.4.5/Makefile.win':
Device or resource busy
This file is locked by some process. IIRC someone said build tries to
restart something and ends up locking this file.

> Jochen, Damjan, do you remember how you fixed it back at the time? Or
> was it handled in the chat session with pono Damjan mentions later in
> this thread (which would basically mean that it disappeared magically
> after a restart and some magic by Infra)?
Afraid it was never really fixed. You need to restart the whole machine
to unlock the apr makefile. So you can start a build.
Then you have to debug the reason for the long build time.

Regards Jochen


Re: Let's fix the Windows build bots

Posted by Andrea Pescetti <pe...@apache.org>.
On 01/02/2016 j.nitschke wrote:
> But a global haltOnFailure would stop the build at 'svn info' step.
> About the 'svn info' step, this one stopped working ...

And we are back here, it seems. With the HTTPS fixes now in place, all 
the Linux and FreeBSD buildbots completed their build successfully, 
while the Windows one ran build --all but then suddenly stopped almost 
at the end, after delivering sc

https://ci.apache.org/builders/aoo-win7/builds/231/steps/build.pl%20--all/logs/stdio
command timed out: 20000 seconds without output, killing pid 13888

and today it stopped at a very early stage that you had already fixed, 
the "svn info":

https://ci.apache.org/builders/aoo-win7/builds/232/steps/svn%20export/logs/stdio
Inappropriate ioctl for device

Jochen, Damjan, do you remember how you fixed it back at the time? Or 
was it handled in the chat session with pono Damjan mentions later in 
this thread (which would basically mean that it disappeared magically 
after a restart and some magic by Infra)?

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: Let's fix the Windows build bots

Posted by "j.nitschke@ok.de" <j....@ok.de>.
On Sun, 31 Jan 2016 10:07:33 -0800 Kay Schenk wrote:
> On Fri, Jan 29, 2016 at 10:04 PM, Damjan Jovanovic <da...@apache.org>
> wrote:
>
>> On Fri, Jan 29, 2016 at 7:25 PM, Kay Schenk <ka...@gmail.com> wrote:
>>
>>> On Thu, Jan 28, 2016 at 11:18 PM, Damjan Jovanovic <da...@apache.org>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I am to get our build bots working again.
>>>>
>>>> So far I've committed a patch to fix #126787 (build breaking regression
>>> in
>>>> main/sal due to <prewin.h> and <postwin.h> being included) in r1727413,
>>> and
>>>> deleted an unused empty file that the RAT scan was treating as missing
>>> the
>>>> ASL in r1727463. The Windows aoo-win7 build bot now gets further,
>>>> infamously breaking in apr, but now also breaking in jpeg (
>>>>
>>>>
>> https://ci.apache.org/builders/aoo-win7/builds/156/steps/build.pl%20--all/logs/stdio
>>>> ).
>>>>
>>>> The error for the jpeg module is:
>>>> dmake:  Error: -- `./
>>>> wntmsci12.pro/misc/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.unpack'
>> not
>>>> found, and can't be made
>>>>
>>>> What looks interesting, if you get the stdio log and look for the
>>> "Entering
>>>> ..." lines and sort and count them, is that the only modules that are
>>>> entered more than once are the same ones that the build breaks in: apr
>>> and
>>>> jpeg:
>>>>
>>>> $ grep Entering /tmp/stdio | sort | uniq -c | grep "^      1 " -v
>>>>       2 Entering /cygdrive/e/slave14/aoo-win7/build/ext_libraries/apr
>>>>       2 Entering /cygdrive/e/slave14/aoo-win7/build/main/jpeg
>>>>
>>>> Is it breaking because build.pl is trying to build those modules more
>>> than
>>>> once (or is it re-entering them because they broke the build)?
>>>>
>>> ​Good that some more eyes are looking into this. I haven't looked at last
>>> night's runs yet but for the last week, I have been making the buildbots
>> a
>>> priority also.
>>>
>>>
>> Thank you.
>>
>> I think the latter is true: it's re-entering them because they broke the
>> build.
>>
>> Yesterday jpeg failed to download and broke the build, and today nss failed
>> to download and also broke the build. Why are downloads often failing
>> during ./bootstrap? Can we get a local mirror or cache external
>> dependencies somehow? Also, why does ./bootstrap seem to return an exit
>> code of 0 when needed dependencies can't be downloaded, and why is
>> haltOnFailure=False for the Windows buildbots but True for other platforms?
>>
> ​No idea about this setting. You might look at the svn logs for our config
> to see if there's any information there. Otherwise, just change it if you
> feel inclined.
Not sure if you are talking about a global setting or just for the
'build.pl --all' step.
But a global haltOnFailure would stop the build at 'svn info' step.

About the 'svn info' step, this one stopped working between 27.07.2015
and 03.08.2015 as one can see in the snapshot builds #7 and #8 which run
exact the same command but later fails.
https://ci.apache.org/builders/aoo-w7snap/builds/7
https://ci.apache.org/builders/aoo-w7snap/builds/8

>
> Also the ASF does have a Win 8 buildbot setup if we think we should move to
> that.
>
> Finally, I think some of the svn messages we're seeing are due  to the
> specific configuration of svn on these systems, and not changeable by us.
If the cwiki page is up to date there might have been a bad commit in
the past and all new changes didn't get applied.

https://cwiki.apache.org/confluence/display/INFRA/ASF+Buildbot+Configuration+update+workflow

you could test with an innocent change like 'svn co' to 'svn checkout'
and see if it gets applied.
>
>
>
>
>>
>>> ​
>>>
>>>
>>>> Our build page https://ci.apache.org/projects/openoffice/index.html is
>>>> also
>>>> messed up - eg. "Windows Nightly Build Logs" is listed under "FreeBSD
>>>> nightly Install Packages". How do we edit it?
>>>>
>>> ​Ok, yeah, it's a bit of odd formatting. I understand what you're saying.
>>> Look at the other areas and you'll see that the HRs come under the Build
>>> Logs headings. So this is not just a Windows anomaly.​
>>>
>>> ​ ​ We can definitely change this.
>>>
>>> Here is the svn info for this page if you want to change things:
>>>
>>>
>>>
>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/public_html/projects/openoffice
>>> Have fun.
>>>
>>>
>> I made several changes, but they aren't being applied. When/how does the
>> create-ooo-snapshots-index.sh script create the webpage?
>>
>>
>>>
>>>> What else? How do we get SSH access to the build bots for further
>>>> debugging?
>>>>
>>> ​Our buildbot script, openofficeorg.conf is at:
>>>
>>>
>>>
>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects
>>> along with buildbots for other projects.
>>>
>>> The ASF is running buildbot 0.8.9. We don't have direct access to the
>>> buldbot throught SSH. (You need to be part of infra to do that). If you
>>> look at the buildbot documents, you'll see that there are some tools for
>>> directly accessing the buildbot's directories, though.
>>> ​
>>>
>>>
>>>> Thank you
>>>> Damjan
>>>>
>>> ​Thank you!​
>>>
>>>
>>>
>>>
>>> --
>>> ----------------------------------------------------------------------
>>> MzK
>>>
>>> "Though no one can go back and make a brand new start,
>>>  anyone can start from now and make a brand new ending."
>>>                                                           -- Carl Bard


Re: Let's fix the Windows build bots

Posted by Kay Schenk <ka...@gmail.com>.
On Fri, Jan 29, 2016 at 10:04 PM, Damjan Jovanovic <da...@apache.org>
wrote:

> On Fri, Jan 29, 2016 at 7:25 PM, Kay Schenk <ka...@gmail.com> wrote:
>
> > On Thu, Jan 28, 2016 at 11:18 PM, Damjan Jovanovic <da...@apache.org>
> > wrote:
> >
> > > Hi
> > >
> > > I am to get our build bots working again.
> > >
> > > So far I've committed a patch to fix #126787 (build breaking regression
> > in
> > > main/sal due to <prewin.h> and <postwin.h> being included) in r1727413,
> > and
> > > deleted an unused empty file that the RAT scan was treating as missing
> > the
> > > ASL in r1727463. The Windows aoo-win7 build bot now gets further,
> > > infamously breaking in apr, but now also breaking in jpeg (
> > >
> > >
> >
> https://ci.apache.org/builders/aoo-win7/builds/156/steps/build.pl%20--all/logs/stdio
> > > ).
> > >
> > > The error for the jpeg module is:
> > > dmake:  Error: -- `./
> > > wntmsci12.pro/misc/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.unpack'
> not
> > > found, and can't be made
> > >
> > > What looks interesting, if you get the stdio log and look for the
> > "Entering
> > > ..." lines and sort and count them, is that the only modules that are
> > > entered more than once are the same ones that the build breaks in: apr
> > and
> > > jpeg:
> > >
> > > $ grep Entering /tmp/stdio | sort | uniq -c | grep "^      1 " -v
> > >       2 Entering /cygdrive/e/slave14/aoo-win7/build/ext_libraries/apr
> > >       2 Entering /cygdrive/e/slave14/aoo-win7/build/main/jpeg
> > >
> > > Is it breaking because build.pl is trying to build those modules more
> > than
> > > once (or is it re-entering them because they broke the build)?
> > >
> >
> > ​Good that some more eyes are looking into this. I haven't looked at last
> > night's runs yet but for the last week, I have been making the buildbots
> a
> > priority also.
> >
> >
> Thank you.
>
> I think the latter is true: it's re-entering them because they broke the
> build.
>
> Yesterday jpeg failed to download and broke the build, and today nss failed
> to download and also broke the build. Why are downloads often failing
> during ./bootstrap? Can we get a local mirror or cache external
> dependencies somehow? Also, why does ./bootstrap seem to return an exit
> code of 0 when needed dependencies can't be downloaded, and why is
> haltOnFailure=False for the Windows buildbots but True for other platforms?
>

​No idea about this setting. You might look at the svn logs for our config
to see if there's any information there. Otherwise, just change it if you
feel inclined.

Also the ASF does have a Win 8 buildbot setup if we think we should move to
that.

Finally, I think some of the svn messages we're seeing are due  to the
specific configuration of svn on these systems, and not changeable by us. ​




>
>
> > ​
> >
> >
> > >
> > > Our build page https://ci.apache.org/projects/openoffice/index.html is
> > > also
> > > messed up - eg. "Windows Nightly Build Logs" is listed under "FreeBSD
> > > nightly Install Packages". How do we edit it?
> > >
> >
> > ​Ok, yeah, it's a bit of odd formatting. I understand what you're saying.
> > Look at the other areas and you'll see that the HRs come under the Build
> > Logs headings. So this is not just a Windows anomaly.​
> >
> > ​ ​ We can definitely change this.
> >
> > Here is the svn info for this page if you want to change things:
> >
> >
> >
> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/public_html/projects/openoffice
> >
> > Have fun.
> >
> >
> I made several changes, but they aren't being applied. When/how does the
> create-ooo-snapshots-index.sh script create the webpage?
>
>
> >
> >
> > > What else? How do we get SSH access to the build bots for further
> > > debugging?
> > >
> >
> > ​Our buildbot script, openofficeorg.conf is at:
> >
> >
> >
> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects
> >
> > along with buildbots for other projects.
> >
> > The ASF is running buildbot 0.8.9. We don't have direct access to the
> > buldbot throught SSH. (You need to be part of infra to do that). If you
> > look at the buildbot documents, you'll see that there are some tools for
> > directly accessing the buildbot's directories, though.
> > ​
> >
> >
> > >
> > > Thank you
> > > Damjan
> > >
> >
> > ​Thank you!​
> >
> >
> >
> >
> > --
> > ----------------------------------------------------------------------
> > MzK
> >
> > "Though no one can go back and make a brand new start,
> >  anyone can start from now and make a brand new ending."
> >                                                           -- Carl Bard
> >
>



-- 
----------------------------------------------------------------------
MzK

"Though no one can go back and make a brand new start,
 anyone can start from now and make a brand new ending."
                                                          -- Carl Bard

Re: Let's fix the Windows build bots

Posted by "j.nitschke@ok.de" <j....@ok.de>.
On Tue, 2 Feb 2016 04:48:17 +0200 Damjan Jovanovic wrote:
> This still isn't working: I've changed
> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/public_html/projects/openoffice/create-ooo-snapshots-index.sh
> but https://ci.apache.org/projects/openoffice/index.html hasn't
> changed in days. Please help? Damjan 
We need to open a JIRA ticket or ask on HipChat. I found no other
information than in the cwiki.

https://cwiki.apache.org/confluence/display/INFRA/ASF+Buildbot+Configuration+update+workflow


Re: Let's fix the Windows build bots

Posted by Damjan Jovanovic <da...@apache.org>.
On Sat, Jan 30, 2016 at 8:04 AM, Damjan Jovanovic <da...@apache.org> wrote:

>
>
> On Fri, Jan 29, 2016 at 7:25 PM, Kay Schenk <ka...@gmail.com> wrote:
>
>> On Thu, Jan 28, 2016 at 11:18 PM, Damjan Jovanovic <da...@apache.org>
>> wrote:
>>
>> >
>> > Our build page https://ci.apache.org/projects/openoffice/index.html is
>> > also
>> > messed up - eg. "Windows Nightly Build Logs" is listed under "FreeBSD
>> > nightly Install Packages". How do we edit it?
>> >
>>
>> ​Ok, yeah, it's a bit of odd formatting. I understand what you're saying.
>> Look at the other areas and you'll see that the HRs come under the Build
>> Logs headings. So this is not just a Windows anomaly.​
>>
>> ​ ​ We can definitely change this.
>>
>> Here is the svn info for this page if you want to change things:
>>
>>
>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/public_html/projects/openoffice
>>
>> Have fun.
>>
>>
> I made several changes, but they aren't being applied. When/how does the
> create-ooo-snapshots-index.sh script create the webpage?
>
>

This still isn't working: I've changed
https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/public_html/projects/openoffice/create-ooo-snapshots-index.sh
but  https://ci.apache.org/projects/openoffice/index.html hasn't changed in
days. Please help?

Damjan

Re: Let's fix the Windows build bots

Posted by Damjan Jovanovic <da...@apache.org>.
On Fri, Jan 29, 2016 at 7:25 PM, Kay Schenk <ka...@gmail.com> wrote:

> On Thu, Jan 28, 2016 at 11:18 PM, Damjan Jovanovic <da...@apache.org>
> wrote:
>
> > Hi
> >
> > I am to get our build bots working again.
> >
> > So far I've committed a patch to fix #126787 (build breaking regression
> in
> > main/sal due to <prewin.h> and <postwin.h> being included) in r1727413,
> and
> > deleted an unused empty file that the RAT scan was treating as missing
> the
> > ASL in r1727463. The Windows aoo-win7 build bot now gets further,
> > infamously breaking in apr, but now also breaking in jpeg (
> >
> >
> https://ci.apache.org/builders/aoo-win7/builds/156/steps/build.pl%20--all/logs/stdio
> > ).
> >
> > The error for the jpeg module is:
> > dmake:  Error: -- `./
> > wntmsci12.pro/misc/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.unpack' not
> > found, and can't be made
> >
> > What looks interesting, if you get the stdio log and look for the
> "Entering
> > ..." lines and sort and count them, is that the only modules that are
> > entered more than once are the same ones that the build breaks in: apr
> and
> > jpeg:
> >
> > $ grep Entering /tmp/stdio | sort | uniq -c | grep "^      1 " -v
> >       2 Entering /cygdrive/e/slave14/aoo-win7/build/ext_libraries/apr
> >       2 Entering /cygdrive/e/slave14/aoo-win7/build/main/jpeg
> >
> > Is it breaking because build.pl is trying to build those modules more
> than
> > once (or is it re-entering them because they broke the build)?
> >
>
> ​Good that some more eyes are looking into this. I haven't looked at last
> night's runs yet but for the last week, I have been making the buildbots a
> priority also.
>
>
Thank you.

I think the latter is true: it's re-entering them because they broke the
build.

Yesterday jpeg failed to download and broke the build, and today nss failed
to download and also broke the build. Why are downloads often failing
during ./bootstrap? Can we get a local mirror or cache external
dependencies somehow? Also, why does ./bootstrap seem to return an exit
code of 0 when needed dependencies can't be downloaded, and why is
haltOnFailure=False for the Windows buildbots but True for other platforms?



> ​
>
>
> >
> > Our build page https://ci.apache.org/projects/openoffice/index.html is
> > also
> > messed up - eg. "Windows Nightly Build Logs" is listed under "FreeBSD
> > nightly Install Packages". How do we edit it?
> >
>
> ​Ok, yeah, it's a bit of odd formatting. I understand what you're saying.
> Look at the other areas and you'll see that the HRs come under the Build
> Logs headings. So this is not just a Windows anomaly.​
>
> ​ ​ We can definitely change this.
>
> Here is the svn info for this page if you want to change things:
>
>
> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/public_html/projects/openoffice
>
> Have fun.
>
>
I made several changes, but they aren't being applied. When/how does the
create-ooo-snapshots-index.sh script create the webpage?


>
>
> > What else? How do we get SSH access to the build bots for further
> > debugging?
> >
>
> ​Our buildbot script, openofficeorg.conf is at:
>
>
> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects
>
> along with buildbots for other projects.
>
> The ASF is running buildbot 0.8.9. We don't have direct access to the
> buldbot throught SSH. (You need to be part of infra to do that). If you
> look at the buildbot documents, you'll see that there are some tools for
> directly accessing the buildbot's directories, though.
> ​
>
>
> >
> > Thank you
> > Damjan
> >
>
> ​Thank you!​
>
>
>
>
> --
> ----------------------------------------------------------------------
> MzK
>
> "Though no one can go back and make a brand new start,
>  anyone can start from now and make a brand new ending."
>                                                           -- Carl Bard
>

Re: Let's fix the Windows build bots

Posted by Kay Schenk <ka...@gmail.com>.
On Thu, Jan 28, 2016 at 11:18 PM, Damjan Jovanovic <da...@apache.org>
wrote:

> Hi
>
> I am to get our build bots working again.
>
> So far I've committed a patch to fix #126787 (build breaking regression in
> main/sal due to <prewin.h> and <postwin.h> being included) in r1727413, and
> deleted an unused empty file that the RAT scan was treating as missing the
> ASL in r1727463. The Windows aoo-win7 build bot now gets further,
> infamously breaking in apr, but now also breaking in jpeg (
>
> https://ci.apache.org/builders/aoo-win7/builds/156/steps/build.pl%20--all/logs/stdio
> ).
>
> The error for the jpeg module is:
> dmake:  Error: -- `./
> wntmsci12.pro/misc/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.unpack' not
> found, and can't be made
>
> What looks interesting, if you get the stdio log and look for the "Entering
> ..." lines and sort and count them, is that the only modules that are
> entered more than once are the same ones that the build breaks in: apr and
> jpeg:
>
> $ grep Entering /tmp/stdio | sort | uniq -c | grep "^      1 " -v
>       2 Entering /cygdrive/e/slave14/aoo-win7/build/ext_libraries/apr
>       2 Entering /cygdrive/e/slave14/aoo-win7/build/main/jpeg
>
> Is it breaking because build.pl is trying to build those modules more than
> once (or is it re-entering them because they broke the build)?
>

​Good that some more eyes are looking into this. I haven't looked at last
night's runs yet but for the last week, I have been making the buildbots a
priority also.

​


>
> Our build page https://ci.apache.org/projects/openoffice/index.html is
> also
> messed up - eg. "Windows Nightly Build Logs" is listed under "FreeBSD
> nightly Install Packages". How do we edit it?
>

​Ok, yeah, it's a bit of odd formatting. I understand what you're saying.
Look at the other areas and you'll see that the HRs come under the Build
Logs headings. So this is not just a Windows anomaly.​

​ ​ We can definitely change this.

Here is the svn info for this page if you want to change things:

https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/public_html/projects/openoffice

Have fun.



> What else? How do we get SSH access to the build bots for further
> debugging?
>

​Our buildbot script, openofficeorg.conf is at:

https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects

along with buildbots for other projects.

The ASF is running buildbot 0.8.9. We don't have direct access to the
buldbot throught SSH. (You need to be part of infra to do that). If you
look at the buildbot documents, you'll see that there are some tools for
directly accessing the buildbot's directories, though.
​


>
> Thank you
> Damjan
>

​Thank you!​




-- 
----------------------------------------------------------------------
MzK

"Though no one can go back and make a brand new start,
 anyone can start from now and make a brand new ending."
                                                          -- Carl Bard

Re: Let's fix the Windows build bots

Posted by Damjan Jovanovic <da...@apache.org>.
On Fri, Jan 29, 2016 at 9:18 AM, Damjan Jovanovic <da...@apache.org> wrote:

> Hi
>
> I am to get our build bots working again.
>
> So far I've committed a patch to fix #126787 (build breaking regression in
> main/sal due to <prewin.h> and <postwin.h> being included) in r1727413, and
> deleted an unused empty file that the RAT scan was treating as missing the
> ASL in r1727463. The Windows aoo-win7 build bot now gets further,
> infamously breaking in apr, but now also breaking in jpeg (
> https://ci.apache.org/builders/aoo-win7/builds/156/steps/build.pl%20--all/logs/stdio
> ).
>
> The error for the jpeg module is:
> dmake:  Error: -- `./
> wntmsci12.pro/misc/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.unpack' not
> found, and can't be made
>
>
I can reproduce this error on my own Windows build by deleting or renaming
ext_sources/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.tar.gz and then
running "build" from inside main/jpeg. The problem on the buildbot appears
to be that libjpeg couldn't be downloaded (
https://ci.apache.org/builders/aoo-win7/builds/156/steps/bootstrap/logs/stdio
):

downloading to
E:/slave14/aoo-win7/build/ext_sources/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.tar.gz.part
download from http://www.ijg.org/files/jpegsrc.v8d.tar.gz failed
    download failed
downloading to
E:/slave14/aoo-win7/build/ext_sources/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.tar.gz.part
download from
http://sourceforge.net/projects/oooextras.mirror/files/52654eb3b2e60c35731ea8fc87f1bd29-jpeg-8d.tar.gz
failed
    download failed

Maybe it was a temporary networking glitch - 5 downloads failed in total
(maybe we need a mirror internal to Apache?). Let's see how it looks
tomorrow.



> What looks interesting, if you get the stdio log and look for the
> "Entering ..." lines and sort and count them, is that the only modules that
> are entered more than once are the same ones that the build breaks in: apr
> and jpeg:
>
> $ grep Entering /tmp/stdio | sort | uniq -c | grep "^      1 " -v
>       2 Entering /cygdrive/e/slave14/aoo-win7/build/ext_libraries/apr
>       2 Entering /cygdrive/e/slave14/aoo-win7/build/main/jpeg
>
> Is it breaking because build.pl is trying to build those modules more
> than once (or is it re-entering them because they broke the build)?
>
> Our build page https://ci.apache.org/projects/openoffice/index.html is
> also messed up - eg. "Windows Nightly Build Logs" is listed under "FreeBSD
> nightly Install Packages". How do we edit it?
>
> What else? How do we get SSH access to the build bots for further
> debugging?
>
> Thank you
> Damjan
>
>