You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Kirk Lund <kl...@apache.org> on 2016/09/04 05:29:18 UTC

Nightly Build still failing with BindExceptions

We're still hitting BindExceptions in the nightly build, so I'll go ahead
and propose this again: any test that uses AvailablePort to find a random
port could be altered to automatically Retry if it encounters and fails
because of java.net.BindException. Opinions?

-Kirk

:geode-core:integrationTest

com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase FAILED
    java.net.BindException: Failed to create server socket on  null[5,555]
        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
SocketCreator.java:814)
        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
SocketCreator.java:774)
        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
SocketCreator.java:738)
        at com.gemstone.gemfire.internal.cache.tier.sockets.
AcceptorImpl.<init>(AcceptorImpl.java:470)
        at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
CacheServerImpl.java:323)
        at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase(
DiskRegionJUnitTest.java:2215)

        Caused by:
        java.net.BindException: Address already in use
            at java.net.PlainSocketImpl.socketBind(Native Method)
            at java.net.AbstractPlainSocketImpl.bind(
AbstractPlainSocketImpl.java:387)
            at java.net.ServerSocket.bind(ServerSocket.java:375)
            at com.gemstone.gemfire.internal.SocketCreator.
createServerSocket(SocketCreator.java:811)
            ... 5 more

com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase FAILED
    java.net.BindException: Failed to create server socket on  null[5,555]
        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
SocketCreator.java:814)
        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
SocketCreator.java:774)
        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
SocketCreator.java:738)
        at com.gemstone.gemfire.internal.cache.tier.sockets.
AcceptorImpl.<init>(AcceptorImpl.java:470)
        at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
CacheServerImpl.java:323)
        at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
(DiskRegionJUnitTest.java:2103)

        Caused by:
        java.net.BindException: Address already in use
            at java.net.PlainSocketImpl.socketBind(Native Method)
            at java.net.AbstractPlainSocketImpl.bind(
AbstractPlainSocketImpl.java:387)
            at java.net.ServerSocket.bind(ServerSocket.java:375)
            at com.gemstone.gemfire.internal.SocketCreator.
createServerSocket(SocketCreator.java:811)
            ... 5 more

3247 tests completed, 2 failed, 175 skipped

Re: Nightly Build still failing with BindExceptions

Posted by Kirk Lund <kl...@pivotal.io>.
Well, if java.net.PlainSocketImpl.socketBind(Native Method) could tell us
what grabbed the port after we found it was free, then that would provide
us with very valuable information on whether or not we could prevent this.

Otherwise, AvailablePort is already selecting a not-in-use port for the
test to use.

Any sort of port reservation system would need to be in-process where the
port is going to be used which would work for some tests but not all of
them.

-Kirk

On Tue, Sep 6, 2016 at 8:00 AM, Anthony Baker <ab...@pivotal.io> wrote:

> How could we fix AvailablePort so we don’t try to use in-use ports?
>
> Anthony
>
> > On Sep 3, 2016, at 10:29 PM, Kirk Lund <kl...@apache.org> wrote:
> >
> > We're still hitting BindExceptions in the nightly build, so I'll go ahead
> > and propose this again: any test that uses AvailablePort to find a random
> > port could be altered to automatically Retry if it encounters and fails
> > because of java.net.BindException. Opinions?
> >
> > -Kirk
> >
> > :geode-core:integrationTest
> >
> > com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> > testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase FAILED
> >    java.net.BindException: Failed to create server socket on  null[5,555]
> >        at com.gemstone.gemfire.internal.SocketCreator.createServerSock
> et(
> > SocketCreator.java:814)
> >        at com.gemstone.gemfire.internal.SocketCreator.createServerSock
> et(
> > SocketCreator.java:774)
> >        at com.gemstone.gemfire.internal.SocketCreator.createServerSock
> et(
> > SocketCreator.java:738)
> >        at com.gemstone.gemfire.internal.cache.tier.sockets.
> > AcceptorImpl.<init>(AcceptorImpl.java:470)
> >        at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
> > CacheServerImpl.java:323)
> >        at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
> > testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase(
> > DiskRegionJUnitTest.java:2215)
> >
> >        Caused by:
> >        java.net.BindException: Address already in use
> >            at java.net.PlainSocketImpl.socketBind(Native Method)
> >            at java.net.AbstractPlainSocketImpl.bind(
> > AbstractPlainSocketImpl.java:387)
> >            at java.net.ServerSocket.bind(ServerSocket.java:375)
> >            at com.gemstone.gemfire.internal.SocketCreator.
> > createServerSocket(SocketCreator.java:811)
> >            ... 5 more
> >
> > com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> > testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase FAILED
> >    java.net.BindException: Failed to create server socket on  null[5,555]
> >        at com.gemstone.gemfire.internal.SocketCreator.createServerSock
> et(
> > SocketCreator.java:814)
> >        at com.gemstone.gemfire.internal.SocketCreator.createServerSock
> et(
> > SocketCreator.java:774)
> >        at com.gemstone.gemfire.internal.SocketCreator.createServerSock
> et(
> > SocketCreator.java:738)
> >        at com.gemstone.gemfire.internal.cache.tier.sockets.
> > AcceptorImpl.<init>(AcceptorImpl.java:470)
> >        at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
> > CacheServerImpl.java:323)
> >        at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
> > testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
> > (DiskRegionJUnitTest.java:2103)
> >
> >        Caused by:
> >        java.net.BindException: Address already in use
> >            at java.net.PlainSocketImpl.socketBind(Native Method)
> >            at java.net.AbstractPlainSocketImpl.bind(
> > AbstractPlainSocketImpl.java:387)
> >            at java.net.ServerSocket.bind(ServerSocket.java:375)
> >            at com.gemstone.gemfire.internal.SocketCreator.
> > createServerSocket(SocketCreator.java:811)
> >            ... 5 more
> >
> > 3247 tests completed, 2 failed, 175 skipped
>
>

Re: Nightly Build still failing with BindExceptions

Posted by Dan Smith <ds...@pivotal.io>.
I suspect many of these BindExceptions are caused by the test itself trying
to use the same port twice, rather than something else on the box grabbing
a port.

I do think the long term solution is to get rid of AvailablePort. But maybe
in the short term we could change it to not pick a random number, but
instead always start from the same point? That way if there are bugs in the
tests they would fail every time.

-Dan

On Wed, Sep 7, 2016 at 6:17 AM, Jens Deppe <jd...@pivotal.io> wrote:

> We're already using that plugin to run the distributedTest task. We have a
> story to also implement that for flaky tests.
>
> --Jens
>
> On Tue, Sep 6, 2016 at 9:16 PM, Sai Boorlagadda <sai.boorlagadda@gmail.com
> >
> wrote:
>
> > +1 for dockerized tests.
> >
> > Most of the CI failures due to state left over are not easily
> reproducible.
> > I prefer spending time eliminating these failures and may be dockerized
> > tests would be the way to go.
> >
> > Sai
> >
> > On Tue, Sep 6, 2016 at 5:44 PM, Swapnil Bawaskar <sb...@pivotal.io>
> > wrote:
> >
> > > To make sure this is not a problem again, how about running the tests
> in
> > > their own container using something like gradle-dockerized-test-plugin[
> > 1]?
> > > If each of our test is run in its own container, we will be able to
> > address
> > > the BindAddress as well as "state left by previous test" issue. Sure
> this
> > > will take longer to complete the tests, but we only do a nightly build.
> > We
> > > could also run our tests in parallel in different containers to
> speed-up
> > > our build. We could also go one step further in getting a clean slate
> on
> > > "CI failure" issues. My main argument for doing this are:
> > > 1. We have 190 issues that are marked as "ci" failures [2].
> > > 2. That a lot of CI failures are due to state left behind by previous
> > > tests. (26 are just bind exceptions[3])
> > >
> > > Fixing 190 test issues is definitely going to slow us down from adding
> > > features to Geode, so getting a clean slate will allow us to narrow
> down
> > CI
> > > failures to race condition in test (or product).
> > >
> > > If we think this is a good idea, then we could check with ASF infra to
> > see
> > > if docker can be setup on jenkins slaves.
> > >
> > > [1] https://github.com/pedjak/gradle-dockerized-test-plugin
> > > [2]
> > > https://issues.apache.org/jira/browse/GEODE-1778?jql=
> > > project%20%3D%20GEODE%20AND%20status%20in%20(Open%2C%20%
> > > 22In%20Progress%22%2C%20Reopened)%20AND%20labels%20%3D%20ci
> > > [3]
> > > https://issues.apache.org/jira/browse/GEODE-973?jql=
> > > project%20%3D%20GEODE%20AND%20status%20in%20(Open%2C%20%
> > > 22In%20Progress%22%2C%20Reopened)%20AND%20text%20~%
> 20%22BindException%22
> > >
> > > On Tue, Sep 6, 2016 at 3:01 PM, Kirk Lund <kl...@pivotal.io> wrote:
> > >
> > > > No wonder that test is intermittently failing then. I didn't think we
> > had
> > > > any tests with hard-coded ports. I filed GEODE-1863 and Darrel picked
> > it
> > > > up.
> > > >
> > > > -Kirk
> > > >
> > > > On Tue, Sep 6, 2016 at 9:30 AM, Bruce Schuchardt <
> > bschuchardt@pivotal.io
> > > >
> > > > wrote:
> > > >
> > > > > This test is not using AvailablePort.  There are two test cases in
> > this
> > > > > class that alway use port 5555.
> > > > >
> > > > >
> > > > > Le 9/6/2016 à 8:00 AM, Anthony Baker a écrit :
> > > > >
> > > > >> How could we fix AvailablePort so we don’t try to use in-use
> ports?
> > > > >>
> > > > >> Anthony
> > > > >>
> > > > >> On Sep 3, 2016, at 10:29 PM, Kirk Lund <kl...@apache.org> wrote:
> > > > >>>
> > > > >>> We're still hitting BindExceptions in the nightly build, so I'll
> go
> > > > ahead
> > > > >>> and propose this again: any test that uses AvailablePort to find
> a
> > > > random
> > > > >>> port could be altered to automatically Retry if it encounters and
> > > fails
> > > > >>> because of java.net.BindException. Opinions?
> > > > >>>
> > > > >>> -Kirk
> > > > >>>
> > > > >>> :geode-core:integrationTest
> > > > >>>
> > > > >>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> > > > >>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase
> FAILED
> > > > >>>     java.net.BindException: Failed to create server socket on
> > > > >>> null[5,555]
> > > > >>>         at com.gemstone.gemfire.internal.
> > > > SocketCreator.createServerSock
> > > > >>> et(
> > > > >>> SocketCreator.java:814)
> > > > >>>         at com.gemstone.gemfire.internal.
> > > > SocketCreator.createServerSock
> > > > >>> et(
> > > > >>> SocketCreator.java:774)
> > > > >>>         at com.gemstone.gemfire.internal.
> > > > SocketCreator.createServerSock
> > > > >>> et(
> > > > >>> SocketCreator.java:738)
> > > > >>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
> > > > >>> AcceptorImpl.<init>(AcceptorImpl.java:470)
> > > > >>>         at com.gemstone.gemfire.internal.
> > > cache.CacheServerImpl.start(
> > > > >>> CacheServerImpl.java:323)
> > > > >>>         at com.gemstone.gemfire.internal.
> > cache.DiskRegionJUnitTest.
> > > > >>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase(
> > > > >>> DiskRegionJUnitTest.java:2215)
> > > > >>>
> > > > >>>         Caused by:
> > > > >>>         java.net.BindException: Address already in use
> > > > >>>             at java.net.PlainSocketImpl.socketBind(Native
> Method)
> > > > >>>             at java.net.AbstractPlainSocketImpl.bind(
> > > > >>> AbstractPlainSocketImpl.java:387)
> > > > >>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
> > > > >>>             at com.gemstone.gemfire.internal.SocketCreator.
> > > > >>> createServerSocket(SocketCreator.java:811)
> > > > >>>             ... 5 more
> > > > >>>
> > > > >>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> > > > >>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
> > FAILED
> > > > >>>     java.net.BindException: Failed to create server socket on
> > > > >>> null[5,555]
> > > > >>>         at com.gemstone.gemfire.internal.
> > > > SocketCreator.createServerSock
> > > > >>> et(
> > > > >>> SocketCreator.java:814)
> > > > >>>         at com.gemstone.gemfire.internal.
> > > > SocketCreator.createServerSock
> > > > >>> et(
> > > > >>> SocketCreator.java:774)
> > > > >>>         at com.gemstone.gemfire.internal.
> > > > SocketCreator.createServerSock
> > > > >>> et(
> > > > >>> SocketCreator.java:738)
> > > > >>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
> > > > >>> AcceptorImpl.<init>(AcceptorImpl.java:470)
> > > > >>>         at com.gemstone.gemfire.internal.
> > > cache.CacheServerImpl.start(
> > > > >>> CacheServerImpl.java:323)
> > > > >>>         at com.gemstone.gemfire.internal.
> > cache.DiskRegionJUnitTest.
> > > > >>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
> > > > >>> (DiskRegionJUnitTest.java:2103)
> > > > >>>
> > > > >>>         Caused by:
> > > > >>>         java.net.BindException: Address already in use
> > > > >>>             at java.net.PlainSocketImpl.socketBind(Native
> Method)
> > > > >>>             at java.net.AbstractPlainSocketImpl.bind(
> > > > >>> AbstractPlainSocketImpl.java:387)
> > > > >>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
> > > > >>>             at com.gemstone.gemfire.internal.SocketCreator.
> > > > >>> createServerSocket(SocketCreator.java:811)
> > > > >>>             ... 5 more
> > > > >>>
> > > > >>> 3247 tests completed, 2 failed, 175 skipped
> > > > >>>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Nightly Build still failing with BindExceptions

Posted by Jens Deppe <jd...@pivotal.io>.
We're already using that plugin to run the distributedTest task. We have a
story to also implement that for flaky tests.

--Jens

On Tue, Sep 6, 2016 at 9:16 PM, Sai Boorlagadda <sa...@gmail.com>
wrote:

> +1 for dockerized tests.
>
> Most of the CI failures due to state left over are not easily reproducible.
> I prefer spending time eliminating these failures and may be dockerized
> tests would be the way to go.
>
> Sai
>
> On Tue, Sep 6, 2016 at 5:44 PM, Swapnil Bawaskar <sb...@pivotal.io>
> wrote:
>
> > To make sure this is not a problem again, how about running the tests in
> > their own container using something like gradle-dockerized-test-plugin[
> 1]?
> > If each of our test is run in its own container, we will be able to
> address
> > the BindAddress as well as "state left by previous test" issue. Sure this
> > will take longer to complete the tests, but we only do a nightly build.
> We
> > could also run our tests in parallel in different containers to speed-up
> > our build. We could also go one step further in getting a clean slate on
> > "CI failure" issues. My main argument for doing this are:
> > 1. We have 190 issues that are marked as "ci" failures [2].
> > 2. That a lot of CI failures are due to state left behind by previous
> > tests. (26 are just bind exceptions[3])
> >
> > Fixing 190 test issues is definitely going to slow us down from adding
> > features to Geode, so getting a clean slate will allow us to narrow down
> CI
> > failures to race condition in test (or product).
> >
> > If we think this is a good idea, then we could check with ASF infra to
> see
> > if docker can be setup on jenkins slaves.
> >
> > [1] https://github.com/pedjak/gradle-dockerized-test-plugin
> > [2]
> > https://issues.apache.org/jira/browse/GEODE-1778?jql=
> > project%20%3D%20GEODE%20AND%20status%20in%20(Open%2C%20%
> > 22In%20Progress%22%2C%20Reopened)%20AND%20labels%20%3D%20ci
> > [3]
> > https://issues.apache.org/jira/browse/GEODE-973?jql=
> > project%20%3D%20GEODE%20AND%20status%20in%20(Open%2C%20%
> > 22In%20Progress%22%2C%20Reopened)%20AND%20text%20~%20%22BindException%22
> >
> > On Tue, Sep 6, 2016 at 3:01 PM, Kirk Lund <kl...@pivotal.io> wrote:
> >
> > > No wonder that test is intermittently failing then. I didn't think we
> had
> > > any tests with hard-coded ports. I filed GEODE-1863 and Darrel picked
> it
> > > up.
> > >
> > > -Kirk
> > >
> > > On Tue, Sep 6, 2016 at 9:30 AM, Bruce Schuchardt <
> bschuchardt@pivotal.io
> > >
> > > wrote:
> > >
> > > > This test is not using AvailablePort.  There are two test cases in
> this
> > > > class that alway use port 5555.
> > > >
> > > >
> > > > Le 9/6/2016 à 8:00 AM, Anthony Baker a écrit :
> > > >
> > > >> How could we fix AvailablePort so we don’t try to use in-use ports?
> > > >>
> > > >> Anthony
> > > >>
> > > >> On Sep 3, 2016, at 10:29 PM, Kirk Lund <kl...@apache.org> wrote:
> > > >>>
> > > >>> We're still hitting BindExceptions in the nightly build, so I'll go
> > > ahead
> > > >>> and propose this again: any test that uses AvailablePort to find a
> > > random
> > > >>> port could be altered to automatically Retry if it encounters and
> > fails
> > > >>> because of java.net.BindException. Opinions?
> > > >>>
> > > >>> -Kirk
> > > >>>
> > > >>> :geode-core:integrationTest
> > > >>>
> > > >>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> > > >>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase FAILED
> > > >>>     java.net.BindException: Failed to create server socket on
> > > >>> null[5,555]
> > > >>>         at com.gemstone.gemfire.internal.
> > > SocketCreator.createServerSock
> > > >>> et(
> > > >>> SocketCreator.java:814)
> > > >>>         at com.gemstone.gemfire.internal.
> > > SocketCreator.createServerSock
> > > >>> et(
> > > >>> SocketCreator.java:774)
> > > >>>         at com.gemstone.gemfire.internal.
> > > SocketCreator.createServerSock
> > > >>> et(
> > > >>> SocketCreator.java:738)
> > > >>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
> > > >>> AcceptorImpl.<init>(AcceptorImpl.java:470)
> > > >>>         at com.gemstone.gemfire.internal.
> > cache.CacheServerImpl.start(
> > > >>> CacheServerImpl.java:323)
> > > >>>         at com.gemstone.gemfire.internal.
> cache.DiskRegionJUnitTest.
> > > >>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase(
> > > >>> DiskRegionJUnitTest.java:2215)
> > > >>>
> > > >>>         Caused by:
> > > >>>         java.net.BindException: Address already in use
> > > >>>             at java.net.PlainSocketImpl.socketBind(Native Method)
> > > >>>             at java.net.AbstractPlainSocketImpl.bind(
> > > >>> AbstractPlainSocketImpl.java:387)
> > > >>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
> > > >>>             at com.gemstone.gemfire.internal.SocketCreator.
> > > >>> createServerSocket(SocketCreator.java:811)
> > > >>>             ... 5 more
> > > >>>
> > > >>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> > > >>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
> FAILED
> > > >>>     java.net.BindException: Failed to create server socket on
> > > >>> null[5,555]
> > > >>>         at com.gemstone.gemfire.internal.
> > > SocketCreator.createServerSock
> > > >>> et(
> > > >>> SocketCreator.java:814)
> > > >>>         at com.gemstone.gemfire.internal.
> > > SocketCreator.createServerSock
> > > >>> et(
> > > >>> SocketCreator.java:774)
> > > >>>         at com.gemstone.gemfire.internal.
> > > SocketCreator.createServerSock
> > > >>> et(
> > > >>> SocketCreator.java:738)
> > > >>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
> > > >>> AcceptorImpl.<init>(AcceptorImpl.java:470)
> > > >>>         at com.gemstone.gemfire.internal.
> > cache.CacheServerImpl.start(
> > > >>> CacheServerImpl.java:323)
> > > >>>         at com.gemstone.gemfire.internal.
> cache.DiskRegionJUnitTest.
> > > >>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
> > > >>> (DiskRegionJUnitTest.java:2103)
> > > >>>
> > > >>>         Caused by:
> > > >>>         java.net.BindException: Address already in use
> > > >>>             at java.net.PlainSocketImpl.socketBind(Native Method)
> > > >>>             at java.net.AbstractPlainSocketImpl.bind(
> > > >>> AbstractPlainSocketImpl.java:387)
> > > >>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
> > > >>>             at com.gemstone.gemfire.internal.SocketCreator.
> > > >>> createServerSocket(SocketCreator.java:811)
> > > >>>             ... 5 more
> > > >>>
> > > >>> 3247 tests completed, 2 failed, 175 skipped
> > > >>>
> > > >>
> > > >
> > >
> >
>

Re: Nightly Build still failing with BindExceptions

Posted by Sai Boorlagadda <sa...@gmail.com>.
+1 for dockerized tests.

Most of the CI failures due to state left over are not easily reproducible.
I prefer spending time eliminating these failures and may be dockerized
tests would be the way to go.

Sai

On Tue, Sep 6, 2016 at 5:44 PM, Swapnil Bawaskar <sb...@pivotal.io>
wrote:

> To make sure this is not a problem again, how about running the tests in
> their own container using something like gradle-dockerized-test-plugin[1]?
> If each of our test is run in its own container, we will be able to address
> the BindAddress as well as "state left by previous test" issue. Sure this
> will take longer to complete the tests, but we only do a nightly build. We
> could also run our tests in parallel in different containers to speed-up
> our build. We could also go one step further in getting a clean slate on
> "CI failure" issues. My main argument for doing this are:
> 1. We have 190 issues that are marked as "ci" failures [2].
> 2. That a lot of CI failures are due to state left behind by previous
> tests. (26 are just bind exceptions[3])
>
> Fixing 190 test issues is definitely going to slow us down from adding
> features to Geode, so getting a clean slate will allow us to narrow down CI
> failures to race condition in test (or product).
>
> If we think this is a good idea, then we could check with ASF infra to see
> if docker can be setup on jenkins slaves.
>
> [1] https://github.com/pedjak/gradle-dockerized-test-plugin
> [2]
> https://issues.apache.org/jira/browse/GEODE-1778?jql=
> project%20%3D%20GEODE%20AND%20status%20in%20(Open%2C%20%
> 22In%20Progress%22%2C%20Reopened)%20AND%20labels%20%3D%20ci
> [3]
> https://issues.apache.org/jira/browse/GEODE-973?jql=
> project%20%3D%20GEODE%20AND%20status%20in%20(Open%2C%20%
> 22In%20Progress%22%2C%20Reopened)%20AND%20text%20~%20%22BindException%22
>
> On Tue, Sep 6, 2016 at 3:01 PM, Kirk Lund <kl...@pivotal.io> wrote:
>
> > No wonder that test is intermittently failing then. I didn't think we had
> > any tests with hard-coded ports. I filed GEODE-1863 and Darrel picked it
> > up.
> >
> > -Kirk
> >
> > On Tue, Sep 6, 2016 at 9:30 AM, Bruce Schuchardt <bschuchardt@pivotal.io
> >
> > wrote:
> >
> > > This test is not using AvailablePort.  There are two test cases in this
> > > class that alway use port 5555.
> > >
> > >
> > > Le 9/6/2016 à 8:00 AM, Anthony Baker a écrit :
> > >
> > >> How could we fix AvailablePort so we don’t try to use in-use ports?
> > >>
> > >> Anthony
> > >>
> > >> On Sep 3, 2016, at 10:29 PM, Kirk Lund <kl...@apache.org> wrote:
> > >>>
> > >>> We're still hitting BindExceptions in the nightly build, so I'll go
> > ahead
> > >>> and propose this again: any test that uses AvailablePort to find a
> > random
> > >>> port could be altered to automatically Retry if it encounters and
> fails
> > >>> because of java.net.BindException. Opinions?
> > >>>
> > >>> -Kirk
> > >>>
> > >>> :geode-core:integrationTest
> > >>>
> > >>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> > >>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase FAILED
> > >>>     java.net.BindException: Failed to create server socket on
> > >>> null[5,555]
> > >>>         at com.gemstone.gemfire.internal.
> > SocketCreator.createServerSock
> > >>> et(
> > >>> SocketCreator.java:814)
> > >>>         at com.gemstone.gemfire.internal.
> > SocketCreator.createServerSock
> > >>> et(
> > >>> SocketCreator.java:774)
> > >>>         at com.gemstone.gemfire.internal.
> > SocketCreator.createServerSock
> > >>> et(
> > >>> SocketCreator.java:738)
> > >>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
> > >>> AcceptorImpl.<init>(AcceptorImpl.java:470)
> > >>>         at com.gemstone.gemfire.internal.
> cache.CacheServerImpl.start(
> > >>> CacheServerImpl.java:323)
> > >>>         at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
> > >>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase(
> > >>> DiskRegionJUnitTest.java:2215)
> > >>>
> > >>>         Caused by:
> > >>>         java.net.BindException: Address already in use
> > >>>             at java.net.PlainSocketImpl.socketBind(Native Method)
> > >>>             at java.net.AbstractPlainSocketImpl.bind(
> > >>> AbstractPlainSocketImpl.java:387)
> > >>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
> > >>>             at com.gemstone.gemfire.internal.SocketCreator.
> > >>> createServerSocket(SocketCreator.java:811)
> > >>>             ... 5 more
> > >>>
> > >>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> > >>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase FAILED
> > >>>     java.net.BindException: Failed to create server socket on
> > >>> null[5,555]
> > >>>         at com.gemstone.gemfire.internal.
> > SocketCreator.createServerSock
> > >>> et(
> > >>> SocketCreator.java:814)
> > >>>         at com.gemstone.gemfire.internal.
> > SocketCreator.createServerSock
> > >>> et(
> > >>> SocketCreator.java:774)
> > >>>         at com.gemstone.gemfire.internal.
> > SocketCreator.createServerSock
> > >>> et(
> > >>> SocketCreator.java:738)
> > >>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
> > >>> AcceptorImpl.<init>(AcceptorImpl.java:470)
> > >>>         at com.gemstone.gemfire.internal.
> cache.CacheServerImpl.start(
> > >>> CacheServerImpl.java:323)
> > >>>         at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
> > >>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
> > >>> (DiskRegionJUnitTest.java:2103)
> > >>>
> > >>>         Caused by:
> > >>>         java.net.BindException: Address already in use
> > >>>             at java.net.PlainSocketImpl.socketBind(Native Method)
> > >>>             at java.net.AbstractPlainSocketImpl.bind(
> > >>> AbstractPlainSocketImpl.java:387)
> > >>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
> > >>>             at com.gemstone.gemfire.internal.SocketCreator.
> > >>> createServerSocket(SocketCreator.java:811)
> > >>>             ... 5 more
> > >>>
> > >>> 3247 tests completed, 2 failed, 175 skipped
> > >>>
> > >>
> > >
> >
>

Re: Nightly Build still failing with BindExceptions

Posted by Swapnil Bawaskar <sb...@pivotal.io>.
To make sure this is not a problem again, how about running the tests in
their own container using something like gradle-dockerized-test-plugin[1]?
If each of our test is run in its own container, we will be able to address
the BindAddress as well as "state left by previous test" issue. Sure this
will take longer to complete the tests, but we only do a nightly build. We
could also run our tests in parallel in different containers to speed-up
our build. We could also go one step further in getting a clean slate on
"CI failure" issues. My main argument for doing this are:
1. We have 190 issues that are marked as "ci" failures [2].
2. That a lot of CI failures are due to state left behind by previous
tests. (26 are just bind exceptions[3])

Fixing 190 test issues is definitely going to slow us down from adding
features to Geode, so getting a clean slate will allow us to narrow down CI
failures to race condition in test (or product).

If we think this is a good idea, then we could check with ASF infra to see
if docker can be setup on jenkins slaves.

[1] https://github.com/pedjak/gradle-dockerized-test-plugin
[2]
https://issues.apache.org/jira/browse/GEODE-1778?jql=project%20%3D%20GEODE%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20labels%20%3D%20ci
[3]
https://issues.apache.org/jira/browse/GEODE-973?jql=project%20%3D%20GEODE%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20text%20~%20%22BindException%22

On Tue, Sep 6, 2016 at 3:01 PM, Kirk Lund <kl...@pivotal.io> wrote:

> No wonder that test is intermittently failing then. I didn't think we had
> any tests with hard-coded ports. I filed GEODE-1863 and Darrel picked it
> up.
>
> -Kirk
>
> On Tue, Sep 6, 2016 at 9:30 AM, Bruce Schuchardt <bs...@pivotal.io>
> wrote:
>
> > This test is not using AvailablePort.  There are two test cases in this
> > class that alway use port 5555.
> >
> >
> > Le 9/6/2016 à 8:00 AM, Anthony Baker a écrit :
> >
> >> How could we fix AvailablePort so we don’t try to use in-use ports?
> >>
> >> Anthony
> >>
> >> On Sep 3, 2016, at 10:29 PM, Kirk Lund <kl...@apache.org> wrote:
> >>>
> >>> We're still hitting BindExceptions in the nightly build, so I'll go
> ahead
> >>> and propose this again: any test that uses AvailablePort to find a
> random
> >>> port could be altered to automatically Retry if it encounters and fails
> >>> because of java.net.BindException. Opinions?
> >>>
> >>> -Kirk
> >>>
> >>> :geode-core:integrationTest
> >>>
> >>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> >>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase FAILED
> >>>     java.net.BindException: Failed to create server socket on
> >>> null[5,555]
> >>>         at com.gemstone.gemfire.internal.
> SocketCreator.createServerSock
> >>> et(
> >>> SocketCreator.java:814)
> >>>         at com.gemstone.gemfire.internal.
> SocketCreator.createServerSock
> >>> et(
> >>> SocketCreator.java:774)
> >>>         at com.gemstone.gemfire.internal.
> SocketCreator.createServerSock
> >>> et(
> >>> SocketCreator.java:738)
> >>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
> >>> AcceptorImpl.<init>(AcceptorImpl.java:470)
> >>>         at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
> >>> CacheServerImpl.java:323)
> >>>         at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
> >>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase(
> >>> DiskRegionJUnitTest.java:2215)
> >>>
> >>>         Caused by:
> >>>         java.net.BindException: Address already in use
> >>>             at java.net.PlainSocketImpl.socketBind(Native Method)
> >>>             at java.net.AbstractPlainSocketImpl.bind(
> >>> AbstractPlainSocketImpl.java:387)
> >>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
> >>>             at com.gemstone.gemfire.internal.SocketCreator.
> >>> createServerSocket(SocketCreator.java:811)
> >>>             ... 5 more
> >>>
> >>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> >>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase FAILED
> >>>     java.net.BindException: Failed to create server socket on
> >>> null[5,555]
> >>>         at com.gemstone.gemfire.internal.
> SocketCreator.createServerSock
> >>> et(
> >>> SocketCreator.java:814)
> >>>         at com.gemstone.gemfire.internal.
> SocketCreator.createServerSock
> >>> et(
> >>> SocketCreator.java:774)
> >>>         at com.gemstone.gemfire.internal.
> SocketCreator.createServerSock
> >>> et(
> >>> SocketCreator.java:738)
> >>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
> >>> AcceptorImpl.<init>(AcceptorImpl.java:470)
> >>>         at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
> >>> CacheServerImpl.java:323)
> >>>         at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
> >>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
> >>> (DiskRegionJUnitTest.java:2103)
> >>>
> >>>         Caused by:
> >>>         java.net.BindException: Address already in use
> >>>             at java.net.PlainSocketImpl.socketBind(Native Method)
> >>>             at java.net.AbstractPlainSocketImpl.bind(
> >>> AbstractPlainSocketImpl.java:387)
> >>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
> >>>             at com.gemstone.gemfire.internal.SocketCreator.
> >>> createServerSocket(SocketCreator.java:811)
> >>>             ... 5 more
> >>>
> >>> 3247 tests completed, 2 failed, 175 skipped
> >>>
> >>
> >
>

Re: Nightly Build still failing with BindExceptions

Posted by Kirk Lund <kl...@pivotal.io>.
No wonder that test is intermittently failing then. I didn't think we had
any tests with hard-coded ports. I filed GEODE-1863 and Darrel picked it up.

-Kirk

On Tue, Sep 6, 2016 at 9:30 AM, Bruce Schuchardt <bs...@pivotal.io>
wrote:

> This test is not using AvailablePort.  There are two test cases in this
> class that alway use port 5555.
>
>
> Le 9/6/2016 à 8:00 AM, Anthony Baker a écrit :
>
>> How could we fix AvailablePort so we don’t try to use in-use ports?
>>
>> Anthony
>>
>> On Sep 3, 2016, at 10:29 PM, Kirk Lund <kl...@apache.org> wrote:
>>>
>>> We're still hitting BindExceptions in the nightly build, so I'll go ahead
>>> and propose this again: any test that uses AvailablePort to find a random
>>> port could be altered to automatically Retry if it encounters and fails
>>> because of java.net.BindException. Opinions?
>>>
>>> -Kirk
>>>
>>> :geode-core:integrationTest
>>>
>>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
>>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase FAILED
>>>     java.net.BindException: Failed to create server socket on
>>> null[5,555]
>>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSock
>>> et(
>>> SocketCreator.java:814)
>>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSock
>>> et(
>>> SocketCreator.java:774)
>>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSock
>>> et(
>>> SocketCreator.java:738)
>>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
>>> AcceptorImpl.<init>(AcceptorImpl.java:470)
>>>         at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
>>> CacheServerImpl.java:323)
>>>         at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
>>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase(
>>> DiskRegionJUnitTest.java:2215)
>>>
>>>         Caused by:
>>>         java.net.BindException: Address already in use
>>>             at java.net.PlainSocketImpl.socketBind(Native Method)
>>>             at java.net.AbstractPlainSocketImpl.bind(
>>> AbstractPlainSocketImpl.java:387)
>>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
>>>             at com.gemstone.gemfire.internal.SocketCreator.
>>> createServerSocket(SocketCreator.java:811)
>>>             ... 5 more
>>>
>>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
>>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase FAILED
>>>     java.net.BindException: Failed to create server socket on
>>> null[5,555]
>>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSock
>>> et(
>>> SocketCreator.java:814)
>>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSock
>>> et(
>>> SocketCreator.java:774)
>>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSock
>>> et(
>>> SocketCreator.java:738)
>>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
>>> AcceptorImpl.<init>(AcceptorImpl.java:470)
>>>         at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
>>> CacheServerImpl.java:323)
>>>         at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
>>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
>>> (DiskRegionJUnitTest.java:2103)
>>>
>>>         Caused by:
>>>         java.net.BindException: Address already in use
>>>             at java.net.PlainSocketImpl.socketBind(Native Method)
>>>             at java.net.AbstractPlainSocketImpl.bind(
>>> AbstractPlainSocketImpl.java:387)
>>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
>>>             at com.gemstone.gemfire.internal.SocketCreator.
>>> createServerSocket(SocketCreator.java:811)
>>>             ... 5 more
>>>
>>> 3247 tests completed, 2 failed, 175 skipped
>>>
>>
>

Re: Nightly Build still failing with BindExceptions

Posted by Bruce Schuchardt <bs...@pivotal.io>.
This test is not using AvailablePort.  There are two test cases in this 
class that alway use port 5555.

Le 9/6/2016 � 8:00 AM, Anthony Baker a �crit :
> How could we fix AvailablePort so we don\u2019t try to use in-use ports?
>
> Anthony
>
>> On Sep 3, 2016, at 10:29 PM, Kirk Lund <kl...@apache.org> wrote:
>>
>> We're still hitting BindExceptions in the nightly build, so I'll go ahead
>> and propose this again: any test that uses AvailablePort to find a random
>> port could be altered to automatically Retry if it encounters and fails
>> because of java.net.BindException. Opinions?
>>
>> -Kirk
>>
>> :geode-core:integrationTest
>>
>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase FAILED
>>     java.net.BindException: Failed to create server socket on  null[5,555]
>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
>> SocketCreator.java:814)
>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
>> SocketCreator.java:774)
>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
>> SocketCreator.java:738)
>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
>> AcceptorImpl.<init>(AcceptorImpl.java:470)
>>         at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
>> CacheServerImpl.java:323)
>>         at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
>> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase(
>> DiskRegionJUnitTest.java:2215)
>>
>>         Caused by:
>>         java.net.BindException: Address already in use
>>             at java.net.PlainSocketImpl.socketBind(Native Method)
>>             at java.net.AbstractPlainSocketImpl.bind(
>> AbstractPlainSocketImpl.java:387)
>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
>>             at com.gemstone.gemfire.internal.SocketCreator.
>> createServerSocket(SocketCreator.java:811)
>>             ... 5 more
>>
>> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase FAILED
>>     java.net.BindException: Failed to create server socket on  null[5,555]
>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
>> SocketCreator.java:814)
>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
>> SocketCreator.java:774)
>>         at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
>> SocketCreator.java:738)
>>         at com.gemstone.gemfire.internal.cache.tier.sockets.
>> AcceptorImpl.<init>(AcceptorImpl.java:470)
>>         at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
>> CacheServerImpl.java:323)
>>         at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
>> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
>> (DiskRegionJUnitTest.java:2103)
>>
>>         Caused by:
>>         java.net.BindException: Address already in use
>>             at java.net.PlainSocketImpl.socketBind(Native Method)
>>             at java.net.AbstractPlainSocketImpl.bind(
>> AbstractPlainSocketImpl.java:387)
>>             at java.net.ServerSocket.bind(ServerSocket.java:375)
>>             at com.gemstone.gemfire.internal.SocketCreator.
>> createServerSocket(SocketCreator.java:811)
>>             ... 5 more
>>
>> 3247 tests completed, 2 failed, 175 skipped


Re: Nightly Build still failing with BindExceptions

Posted by Anthony Baker <ab...@pivotal.io>.
How could we fix AvailablePort so we don’t try to use in-use ports?

Anthony

> On Sep 3, 2016, at 10:29 PM, Kirk Lund <kl...@apache.org> wrote:
> 
> We're still hitting BindExceptions in the nightly build, so I'll go ahead
> and propose this again: any test that uses AvailablePort to find a random
> port could be altered to automatically Retry if it encounters and fails
> because of java.net.BindException. Opinions?
> 
> -Kirk
> 
> :geode-core:integrationTest
> 
> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase FAILED
>    java.net.BindException: Failed to create server socket on  null[5,555]
>        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
> SocketCreator.java:814)
>        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
> SocketCreator.java:774)
>        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
> SocketCreator.java:738)
>        at com.gemstone.gemfire.internal.cache.tier.sockets.
> AcceptorImpl.<init>(AcceptorImpl.java:470)
>        at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
> CacheServerImpl.java:323)
>        at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
> testBridgeServerRunningInSynchPersistOnlyForIOExceptionCase(
> DiskRegionJUnitTest.java:2215)
> 
>        Caused by:
>        java.net.BindException: Address already in use
>            at java.net.PlainSocketImpl.socketBind(Native Method)
>            at java.net.AbstractPlainSocketImpl.bind(
> AbstractPlainSocketImpl.java:387)
>            at java.net.ServerSocket.bind(ServerSocket.java:375)
>            at com.gemstone.gemfire.internal.SocketCreator.
> createServerSocket(SocketCreator.java:811)
>            ... 5 more
> 
> com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest >
> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase FAILED
>    java.net.BindException: Failed to create server socket on  null[5,555]
>        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
> SocketCreator.java:814)
>        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
> SocketCreator.java:774)
>        at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(
> SocketCreator.java:738)
>        at com.gemstone.gemfire.internal.cache.tier.sockets.
> AcceptorImpl.<init>(AcceptorImpl.java:470)
>        at com.gemstone.gemfire.internal.cache.CacheServerImpl.start(
> CacheServerImpl.java:323)
>        at com.gemstone.gemfire.internal.cache.DiskRegionJUnitTest.
> testBridgeServerStoppingInSynchPersistOnlyForIOExceptionCase
> (DiskRegionJUnitTest.java:2103)
> 
>        Caused by:
>        java.net.BindException: Address already in use
>            at java.net.PlainSocketImpl.socketBind(Native Method)
>            at java.net.AbstractPlainSocketImpl.bind(
> AbstractPlainSocketImpl.java:387)
>            at java.net.ServerSocket.bind(ServerSocket.java:375)
>            at com.gemstone.gemfire.internal.SocketCreator.
> createServerSocket(SocketCreator.java:811)
>            ... 5 more
> 
> 3247 tests completed, 2 failed, 175 skipped