You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Ed Coleman <de...@etcoleman.com> on 2017/03/11 12:06:10 UTC

Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602 that I often have trouble with this and a few others.

 

Not sure it makes me feel any better, but for me, this is not "new" to 1.7.3. I thought it could be due my virtual-box development environment, but I've tried running verify on a AWS c4.2xlarge instance with the same intermittent results. I have had it pass, but more often than not it fails.

 

To help decide if 1.7.3-rc0 could be a candidate, I made the following chart tracking IT issues – and then at one point the KerberosRenewall passed for me (and it passed a few times in a row) and I stopped updating the chart.:

 

							
	
Instance Type


Test

AWS1

AWS2

AW3

OpenBox 1

OpenBox 2

OpenBox3


AssignmentThreadsIT.testConcurrentAssignmentPerformance:91

 

x

x

x

 

 


BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 » TestTimedOut

 

 

 

 

 

x


ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 » TestTimedOut test...

 

x

 

x

 

 


ConditionalWriterIT.testTrace:1476 » TestTimedOut test timed out after 60 seco...

 

 

x

 

 

 


DurabilityIT.testWriteSpeed:103 log should be faster than flush

x

x

x

x

 

 


FateStarvationIT.run:79 » Runtime java.lang.RuntimeException: org.apache.zooke...

 

 

x

 

 

 


KerberosRenewalIT.testReadAndWriteThroughTicketLifetime » TestTimedOut test ti...

x

x

x

x

x

x


ShellServerIT.trace:1444

x

 

x

 

 

 


TabletStateChangeIteratorIT.test:100 No tables should need attention expected:<0> but was:<1>

x

 

 

 

x

 


UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWithoutDrain:548 » TableOffline

 

 

 

 

 

 


KerberosReplicationIT.dataReplicatedToCorrectTable:224 » TestTimedOut test tim...

 

 

 

 

 

x

 

I am seeing the same intermittent failures with 1.7.3-rc1 and 1.7.3-rc2.

 

-----Original Message-----
From: Christopher [mailto:ctubbsii@apache.org] 
Sent: Saturday, March 11, 2017 1:53 AM
To: Accumulo Dev List <de...@accumulo.apache.org>
Subject: Re: [VOTE] Accumulo 1.7.3-rc2

 

+1, reluctantly, due to KerberosRenewalIT failures described below.

 

Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball contents/license stuffs/ITs.

 

I could not get KerberosRenewalIT to pass at all (I tried half a dozen times). It keeps timing out. It looks like it's supposed to finish between

8 and 9 minutes... an insanely long time for a *single* test to be running, IMO, especially one as narrowly focused as this one (ShellServerIT, for example, runs about that long, but covers a very broad spectrum of Accumulo behavior). This test ignores the scaling parameter, too, so it cannot be scaled with the timeout.factor system property.

 

The actual behavior of the test is to just create a table, put in data, scan it, then delete the table, every 5 seconds for 8 minutes minimum, under the assumption that the Kerberos ticket will expire at some point during that time period, and Accumulo will automatically renew it and continue functioning (the actual condition of expiration and renewal is never checked). This seems like something that should be mocked out on the object responsible for the detecting and handling the renewal, and not a

8-9 minute integration test. It's not even clear from the current test which code is responsible for that (e.g. which code this test is testing).

The most recent failure timed out after 9 minutes trying to create an Accumulo table. This could indicate that there's a problem with the ticket not renewing when there's an expiration waiting for a FATE operation... or it could just be that's where the test happened to be when the 9 minutes were up.

 

Is anybody else experiencing problems with this test?

 

In spite of this failure, I'm willing to give my +1 anyway, since I'm inclined to think this is simply an unreliable test.

 

On Fri, Mar 10, 2017 at 5:45 PM Keith Turner < <ma...@deenlo.com> keith@deenlo.com> wrote:

 

> I also verified the rfile fix.

> 

> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner < <ma...@deenlo.com> keith@deenlo.com> wrote:

> > +1

> >

> > Did the following :

> >

> >  * Was able to build Fluo against jars in staging repo.

> >  * Sigs checkout for tarballs

> >  * No diffs between src tarball and rc2 branch

> >  * Looked at diffs between rc1 and rc2

> >

> >

> > On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman < <ma...@etcoleman.com> dev1@etcoleman.com> wrote:

> >> Accumulo Developers,

> >>

> >>

> >>

> >> Please consider the following candidate for Accumulo 1.7.3. This

> candidate

> >> contains two changes from 1.7.3-rc1:

> >>

> >>

> >>

> >> -           <https://issues.apache.org/jira/browse/ACCUMULO-4600> https://issues.apache.org/jira/browse/ACCUMULO-4600 -

> shell does

> >> not fall back to accumulo-site.xml when on classpath.

> >>

> >> -           <https://issues.apache.org/jira/browse/ACCUMULO-4597> https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE

> from

> >> RFile PrintInfo

> >>

> >>

> >>

> >> Git Commit:

> >>

> >>     38d8a1d139eb21f0c9882be877db1b77aa1a45db

> >>

> >> Branch:

> >>

> >>     1.7.3-rc2

> >>

> >>

> >>

> >> If this vote passes, a gpg-signed tag will be created using:

> >>

> >>     git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3 

> >> 38d8a1d139eb21f0c9882be877db1b77aa1a45db

> >>

> >>

> >>

> >> Staging repo:

> >>

>  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1> https://repository.apache.org/content/repositories/orgapacheaccumulo-1

> 065

> >>

> >> Source (official release artifact):

> >>

>  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1> https://repository.apache.org/content/repositories/orgapacheaccumulo-1

> 065/or

> >> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz

> >>

> >> Binary:

> >>

>  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1> https://repository.apache.org/content/repositories/orgapacheaccumulo-1

> 065/or

> >> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz

> >>

> >> (Append ".sha1", ".md5", or ".asc" to download the signature/hash 

> >> for a given artifact.)

> >>

> >>

> >>

> >> All artifacts were built and staged with:

> >>

> >>     mvn release:prepare && mvn release:perform

> >>

> >>

> >>

> >> Signing keys are available at 

> >>  <https://www.apache.org/dist/accumulo/KEYS> https://www.apache.org/dist/accumulo/KEYS

> >>

> >> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)

> >>

> >>

> >>

> >> Release notes (in progress) can be found at:

> >>  <https://accumulo.apache.org/release_notes/1.7.3> https://accumulo.apache.org/release_notes/1.7.3

> >>

> >>

> >>

> >> Please vote one of:

> >>

> >> [ ] +1 - I have verified and accept...

> >>

> >> [ ] +0 - I have reservations, but not strong enough to vote against...

> >>

> >> [ ] -1 - Because..., I do not accept...

> >>

> >> ... these artifacts as the 1.7.3 release of Apache Accumulo.

> >>

> >>

> >>

> >> This vote will end on Mon Mar 13 13:00:00 UTC 2017

> >>

> >> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)

> >>

> >>

> >>

> >> Thanks!

> >>

> >>

> >>

> >> P.S. Hint: download the whole staging repo with

> >>

> >>     wget -erobots=off -r -l inf -np -nH \

> >>

> >>

> >>

>  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1> https://repository.apache.org/content/repositories/orgapacheaccumulo-1

> 065/

> >>

> >>     # note the trailing slash is needed

> >>

> 


Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Posted by Christopher <ct...@apache.org>.
I didn't say I had a better way. I explained what I saw, and I solicited
feedback from the community. I merely hinted at a *possible* better way,
but I wanted to see what the community thought. I'm just trying to discuss
it.

If you don't want to engage with the discussion, simply don't... but please
stop trying to discourage folks from *having* the discussion if *they* are
getting value out of them.

On Sat, Mar 11, 2017 at 7:24 PM Josh Elser <jo...@gmail.com> wrote:

> This is a do-ocracy. Please just change the test if you believe to have a
> better way to test what it is trying to test.
>
> On Mar 11, 2017 18:43, "Christopher" <ct...@apache.org> wrote:
>
> > On Sat, Mar 11, 2017 at 5:15 PM Josh Elser <jo...@gmail.com> wrote:
> >
> > > Christopher,
> > >
> > > When I wrote that test, there were issues with the minimum functioning
> > > renewal period as provided by the embedded KDC from Kerby. That is why
> > > this test runs for so long -- anything shorter failed.
> > >
> > >
> > I understand that. There was a comment in the code to that effect.
> >
> >
> > > This test passed at one point. I don't run tests on my own hardware to
> > > catch regressions anymore after previous discussions with you on this
> > > matter.
> >
> >
> > I don't understand what you mean by this, or how it applies. I'm sure it
> > did pass at one point... and may still (hence my question to the group
> > asking whether they observed it passing).
> >
> >
> > > In the future, I'd suggest investing the time into investigating
> > > why the test actually failed instead of picking apart the test itself.
> > >
> > >
> > I did preliminary investigation, and forwarded my observations to the
> group
> > for further discussion. I even suggested a possible cause for the
> failure.
> > But I didn't think it would be productive to dig any deeper without first
> > raising what I found to the group for further discussion and feedback.
> >
> > "picking apart the test itself" is also known as "reviewing code" and
> > "investigating". I think you're taking my criticism of the code
> personally,
> > and I'm not sure why. The fact is, I got as far as I could at 1AM on
> > Saturday, and informed the group of what I experienced, because I thought
> > it was relevant to the vote which expires on Monday morning. It seems
> that
> > you'd prefer I postpone my comments until I have some kind of "perfect
> > knowledge" of what went wrong with the test and how to fix it. Aside from
> > the fact that I knew that I wasn't going to have time before the vote
> > concluded on Monday, that makes no sense to me even under ideal
> > circumstances... if we all did that, why would we even have a group?
> We're
> > better when we rely on each other's expertise and knowledge, and discuss
> > problems (or potential problems) as a team. I would like to see this test
> > improved, but I knew that working on it in silence on my own was not
> going
> > to achieve that.
> >
> >
> > > Thanks.
> > >
> > > Ed Coleman wrote:
> > > > I had commented on
> https://issues.apache.org/jira/browse/ACCUMULO-4602
> > > that I often have trouble with this and a few others.
> > > >
> > > >
> > > >
> > > > Not sure it makes me feel any better, but for me, this is not "new"
> to
> > > 1.7.3. I thought it could be due my virtual-box development
> environment,
> > > but I've tried running verify on a AWS c4.2xlarge instance with the
> same
> > > intermittent results. I have had it pass, but more often than not it
> > fails.
> > > >
> > > >
> > > >
> > > > To help decide if 1.7.3-rc0 could be a candidate, I made the
> following
> > > chart tracking IT issues – and then at one point the KerberosRenewall
> > > passed for me (and it passed a few times in a row) and I stopped
> updating
> > > the chart.:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Instance Type
> > > >
> > > >
> > > > Test
> > > >
> > > > AWS1
> > > >
> > > > AWS2
> > > >
> > > > AW3
> > > >
> > > > OpenBox 1
> > > >
> > > > OpenBox 2
> > > >
> > > > OpenBox3
> > > >
> > > >
> > > > AssignmentThreadsIT.testConcurrentAssignmentPerformance:91
> > > >
> > > >
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 »
> > > TestTimedOut
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > x
> > > >
> > > >
> > > > ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 »
> > > TestTimedOut test...
> > > >
> > > >
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ConditionalWriterIT.testTrace:1476 » TestTimedOut test timed out
> after
> > > 60 seco...
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > DurabilityIT.testWriteSpeed:103 log should be faster than flush
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > FateStarvationIT.run:79 » Runtime java.lang.RuntimeException:
> > > org.apache.zooke...
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > KerberosRenewalIT.testReadAndWriteThroughTicketLifetime »
> TestTimedOut
> > > test ti...
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > > x
> > > >
> > > >
> > > > ShellServerIT.trace:1444
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > TabletStateChangeIteratorIT.test:100 No tables should need attention
> > > expected:<0>  but was:<1>
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > >
> > > >
> > > UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWith
> > outDrain:548
> > > » TableOffline
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > KerberosReplicationIT.dataReplicatedToCorrectTable:224 » TestTimedOut
> > > test tim...
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > x
> > > >
> > > >
> > > >
> > > > I am seeing the same intermittent failures with 1.7.3-rc1 and
> > 1.7.3-rc2.
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Christopher [mailto:ctubbsii@apache.org]
> > > > Sent: Saturday, March 11, 2017 1:53 AM
> > > > To: Accumulo Dev List<de...@accumulo.apache.org>
> > > > Subject: Re: [VOTE] Accumulo 1.7.3-rc2
> > > >
> > > >
> > > >
> > > > +1, reluctantly, due to KerberosRenewalIT failures described below.
> > > >
> > > >
> > > >
> > > > Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball
> > > contents/license stuffs/ITs.
> > > >
> > > >
> > > >
> > > > I could not get KerberosRenewalIT to pass at all (I tried half a
> dozen
> > > times). It keeps timing out. It looks like it's supposed to finish
> > between
> > > >
> > > > 8 and 9 minutes... an insanely long time for a *single* test to be
> > > running, IMO, especially one as narrowly focused as this one
> > > (ShellServerIT, for example, runs about that long, but covers a very
> > broad
> > > spectrum of Accumulo behavior). This test ignores the scaling
> parameter,
> > > too, so it cannot be scaled with the timeout.factor system property.
> > > >
> > > >
> > > >
> > > > The actual behavior of the test is to just create a table, put in
> data,
> > > scan it, then delete the table, every 5 seconds for 8 minutes minimum,
> > > under the assumption that the Kerberos ticket will expire at some point
> > > during that time period, and Accumulo will automatically renew it and
> > > continue functioning (the actual condition of expiration and renewal is
> > > never checked). This seems like something that should be mocked out on
> > the
> > > object responsible for the detecting and handling the renewal, and not
> a
> > > >
> > > > 8-9 minute integration test. It's not even clear from the current
> test
> > > which code is responsible for that (e.g. which code this test is
> > testing).
> > > >
> > > > The most recent failure timed out after 9 minutes trying to create an
> > > Accumulo table. This could indicate that there's a problem with the
> > ticket
> > > not renewing when there's an expiration waiting for a FATE operation...
> > or
> > > it could just be that's where the test happened to be when the 9
> minutes
> > > were up.
> > > >
> > > >
> > > >
> > > > Is anybody else experiencing problems with this test?
> > > >
> > > >
> > > >
> > > > In spite of this failure, I'm willing to give my +1 anyway, since I'm
> > > inclined to think this is simply an unreliable test.
> > > >
> > > >
> > > >
> > > > On Fri, Mar 10, 2017 at 5:45 PM Keith Turner<  <mailto:
> > keith@deenlo.com>
> > > keith@deenlo.com>  wrote:
> > > >
> > > >
> > > >
> > > >> I also verified the rfile fix.
> > > >
> > > >
> > > >> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner<  <mailto:
> > > keith@deenlo.com>  keith@deenlo.com>  wrote:
> > > >
> > > >>> +1
> > > >
> > > >
> > > >>> Did the following :
> > > >
> > > >
> > > >>>   * Was able to build Fluo against jars in staging repo.
> > > >
> > > >>>   * Sigs checkout for tarballs
> > > >
> > > >>>   * No diffs between src tarball and rc2 branch
> > > >
> > > >>>   * Looked at diffs between rc1 and rc2
> > > >
> > > >
> > > >
> > > >>> On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman<  <mailto:
> > > dev1@etcoleman.com>  dev1@etcoleman.com>  wrote:
> > > >
> > > >>>> Accumulo Developers,
> > > >
> > > >
> > > >
> > > >
> > > >>>> Please consider the following candidate for Accumulo 1.7.3. This
> > > >
> > > >> candidate
> > > >
> > > >>>> contains two changes from 1.7.3-rc1:
> > > >
> > > >
> > > >
> > > >
> > > >>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4600>
> > > https://issues.apache.org/jira/browse/ACCUMULO-4600 -
> > > >
> > > >> shell does
> > > >
> > > >>>> not fall back to accumulo-site.xml when on classpath.
> > > >
> > > >
> > > >>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4597>
> > > https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE
> > > >
> > > >> from
> > > >
> > > >>>> RFile PrintInfo
> > > >
> > > >
> > > >
> > > >
> > > >>>> Git Commit:
> > > >
> > > >
> > > >>>>      38d8a1d139eb21f0c9882be877db1b77aa1a45db
> > > >
> > > >
> > > >>>> Branch:
> > > >
> > > >
> > > >>>>      1.7.3-rc2
> > > >
> > > >
> > > >
> > > >
> > > >>>> If this vote passes, a gpg-signed tag will be created using:
> > > >
> > > >
> > > >>>>      git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3
> > > >
> > > >>>> 38d8a1d139eb21f0c9882be877db1b77aa1a45db
> > > >
> > > >
> > > >
> > > >
> > > >>>> Staging repo:
> > > >
> > > >
> > > >>   <
> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >
> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > > >
> > > >> 065
> > > >
> > > >
> > > >>>> Source (official release artifact):
> > > >
> > > >
> > > >>   <
> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >
> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > > >
> > > >> 065/or
> > > >
> > > >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz
> > > >
> > > >
> > > >>>> Binary:
> > > >
> > > >
> > > >>   <
> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >
> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > > >
> > > >> 065/or
> > > >
> > > >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
> > > >
> > > >
> > > >>>> (Append ".sha1", ".md5", or ".asc" to download the signature/hash
> > > >
> > > >>>> for a given artifact.)
> > > >
> > > >
> > > >
> > > >
> > > >>>> All artifacts were built and staged with:
> > > >
> > > >
> > > >>>>      mvn release:prepare&&  mvn release:perform
> > > >
> > > >
> > > >
> > > >
> > > >>>> Signing keys are available at
> > > >
> > > >>>>   <https://www.apache.org/dist/accumulo/KEYS>
> > > https://www.apache.org/dist/accumulo/KEYS
> > > >
> > > >
> > > >>>> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)
> > > >
> > > >
> > > >
> > > >
> > > >>>> Release notes (in progress) can be found at:
> > > >
> > > >>>>   <https://accumulo.apache.org/release_notes/1.7.3>
> > > https://accumulo.apache.org/release_notes/1.7.3
> > > >
> > > >
> > > >
> > > >
> > > >>>> Please vote one of:
> > > >
> > > >
> > > >>>> [ ] +1 - I have verified and accept...
> > > >
> > > >
> > > >>>> [ ] +0 - I have reservations, but not strong enough to vote
> > against...
> > > >
> > > >
> > > >>>> [ ] -1 - Because..., I do not accept...
> > > >
> > > >
> > > >>>> ... these artifacts as the 1.7.3 release of Apache Accumulo.
> > > >
> > > >
> > > >
> > > >
> > > >>>> This vote will end on Mon Mar 13 13:00:00 UTC 2017
> > > >
> > > >
> > > >>>> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)
> > > >
> > > >
> > > >
> > > >
> > > >>>> Thanks!
> > > >
> > > >
> > > >
> > > >
> > > >>>> P.S. Hint: download the whole staging repo with
> > > >
> > > >
> > > >>>>      wget -erobots=off -r -l inf -np -nH \
> > > >
> > > >
> > > >
> > > >
> > > >>   <
> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >
> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > > >
> > > >> 065/
> > > >
> > > >
> > > >>>>      # note the trailing slash is needed
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Posted by Josh Elser <jo...@gmail.com>.
This is a do-ocracy. Please just change the test if you believe to have a
better way to test what it is trying to test.

On Mar 11, 2017 18:43, "Christopher" <ct...@apache.org> wrote:

> On Sat, Mar 11, 2017 at 5:15 PM Josh Elser <jo...@gmail.com> wrote:
>
> > Christopher,
> >
> > When I wrote that test, there were issues with the minimum functioning
> > renewal period as provided by the embedded KDC from Kerby. That is why
> > this test runs for so long -- anything shorter failed.
> >
> >
> I understand that. There was a comment in the code to that effect.
>
>
> > This test passed at one point. I don't run tests on my own hardware to
> > catch regressions anymore after previous discussions with you on this
> > matter.
>
>
> I don't understand what you mean by this, or how it applies. I'm sure it
> did pass at one point... and may still (hence my question to the group
> asking whether they observed it passing).
>
>
> > In the future, I'd suggest investing the time into investigating
> > why the test actually failed instead of picking apart the test itself.
> >
> >
> I did preliminary investigation, and forwarded my observations to the group
> for further discussion. I even suggested a possible cause for the failure.
> But I didn't think it would be productive to dig any deeper without first
> raising what I found to the group for further discussion and feedback.
>
> "picking apart the test itself" is also known as "reviewing code" and
> "investigating". I think you're taking my criticism of the code personally,
> and I'm not sure why. The fact is, I got as far as I could at 1AM on
> Saturday, and informed the group of what I experienced, because I thought
> it was relevant to the vote which expires on Monday morning. It seems that
> you'd prefer I postpone my comments until I have some kind of "perfect
> knowledge" of what went wrong with the test and how to fix it. Aside from
> the fact that I knew that I wasn't going to have time before the vote
> concluded on Monday, that makes no sense to me even under ideal
> circumstances... if we all did that, why would we even have a group? We're
> better when we rely on each other's expertise and knowledge, and discuss
> problems (or potential problems) as a team. I would like to see this test
> improved, but I knew that working on it in silence on my own was not going
> to achieve that.
>
>
> > Thanks.
> >
> > Ed Coleman wrote:
> > > I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602
> > that I often have trouble with this and a few others.
> > >
> > >
> > >
> > > Not sure it makes me feel any better, but for me, this is not "new" to
> > 1.7.3. I thought it could be due my virtual-box development environment,
> > but I've tried running verify on a AWS c4.2xlarge instance with the same
> > intermittent results. I have had it pass, but more often than not it
> fails.
> > >
> > >
> > >
> > > To help decide if 1.7.3-rc0 could be a candidate, I made the following
> > chart tracking IT issues – and then at one point the KerberosRenewall
> > passed for me (and it passed a few times in a row) and I stopped updating
> > the chart.:
> > >
> > >
> > >
> > >
> > >
> > > Instance Type
> > >
> > >
> > > Test
> > >
> > > AWS1
> > >
> > > AWS2
> > >
> > > AW3
> > >
> > > OpenBox 1
> > >
> > > OpenBox 2
> > >
> > > OpenBox3
> > >
> > >
> > > AssignmentThreadsIT.testConcurrentAssignmentPerformance:91
> > >
> > >
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > > BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 »
> > TestTimedOut
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > > ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 »
> > TestTimedOut test...
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > > ConditionalWriterIT.testTrace:1476 » TestTimedOut test timed out after
> > 60 seco...
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > DurabilityIT.testWriteSpeed:103 log should be faster than flush
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > > FateStarvationIT.run:79 » Runtime java.lang.RuntimeException:
> > org.apache.zooke...
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > KerberosRenewalIT.testReadAndWriteThroughTicketLifetime » TestTimedOut
> > test ti...
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > >
> > > ShellServerIT.trace:1444
> > >
> > > x
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > TabletStateChangeIteratorIT.test:100 No tables should need attention
> > expected:<0>  but was:<1>
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWith
> outDrain:548
> > » TableOffline
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > KerberosReplicationIT.dataReplicatedToCorrectTable:224 » TestTimedOut
> > test tim...
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > > I am seeing the same intermittent failures with 1.7.3-rc1 and
> 1.7.3-rc2.
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Christopher [mailto:ctubbsii@apache.org]
> > > Sent: Saturday, March 11, 2017 1:53 AM
> > > To: Accumulo Dev List<de...@accumulo.apache.org>
> > > Subject: Re: [VOTE] Accumulo 1.7.3-rc2
> > >
> > >
> > >
> > > +1, reluctantly, due to KerberosRenewalIT failures described below.
> > >
> > >
> > >
> > > Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball
> > contents/license stuffs/ITs.
> > >
> > >
> > >
> > > I could not get KerberosRenewalIT to pass at all (I tried half a dozen
> > times). It keeps timing out. It looks like it's supposed to finish
> between
> > >
> > > 8 and 9 minutes... an insanely long time for a *single* test to be
> > running, IMO, especially one as narrowly focused as this one
> > (ShellServerIT, for example, runs about that long, but covers a very
> broad
> > spectrum of Accumulo behavior). This test ignores the scaling parameter,
> > too, so it cannot be scaled with the timeout.factor system property.
> > >
> > >
> > >
> > > The actual behavior of the test is to just create a table, put in data,
> > scan it, then delete the table, every 5 seconds for 8 minutes minimum,
> > under the assumption that the Kerberos ticket will expire at some point
> > during that time period, and Accumulo will automatically renew it and
> > continue functioning (the actual condition of expiration and renewal is
> > never checked). This seems like something that should be mocked out on
> the
> > object responsible for the detecting and handling the renewal, and not a
> > >
> > > 8-9 minute integration test. It's not even clear from the current test
> > which code is responsible for that (e.g. which code this test is
> testing).
> > >
> > > The most recent failure timed out after 9 minutes trying to create an
> > Accumulo table. This could indicate that there's a problem with the
> ticket
> > not renewing when there's an expiration waiting for a FATE operation...
> or
> > it could just be that's where the test happened to be when the 9 minutes
> > were up.
> > >
> > >
> > >
> > > Is anybody else experiencing problems with this test?
> > >
> > >
> > >
> > > In spite of this failure, I'm willing to give my +1 anyway, since I'm
> > inclined to think this is simply an unreliable test.
> > >
> > >
> > >
> > > On Fri, Mar 10, 2017 at 5:45 PM Keith Turner<  <mailto:
> keith@deenlo.com>
> > keith@deenlo.com>  wrote:
> > >
> > >
> > >
> > >> I also verified the rfile fix.
> > >
> > >
> > >> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner<  <mailto:
> > keith@deenlo.com>  keith@deenlo.com>  wrote:
> > >
> > >>> +1
> > >
> > >
> > >>> Did the following :
> > >
> > >
> > >>>   * Was able to build Fluo against jars in staging repo.
> > >
> > >>>   * Sigs checkout for tarballs
> > >
> > >>>   * No diffs between src tarball and rc2 branch
> > >
> > >>>   * Looked at diffs between rc1 and rc2
> > >
> > >
> > >
> > >>> On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman<  <mailto:
> > dev1@etcoleman.com>  dev1@etcoleman.com>  wrote:
> > >
> > >>>> Accumulo Developers,
> > >
> > >
> > >
> > >
> > >>>> Please consider the following candidate for Accumulo 1.7.3. This
> > >
> > >> candidate
> > >
> > >>>> contains two changes from 1.7.3-rc1:
> > >
> > >
> > >
> > >
> > >>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4600>
> > https://issues.apache.org/jira/browse/ACCUMULO-4600 -
> > >
> > >> shell does
> > >
> > >>>> not fall back to accumulo-site.xml when on classpath.
> > >
> > >
> > >>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4597>
> > https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE
> > >
> > >> from
> > >
> > >>>> RFile PrintInfo
> > >
> > >
> > >
> > >
> > >>>> Git Commit:
> > >
> > >
> > >>>>      38d8a1d139eb21f0c9882be877db1b77aa1a45db
> > >
> > >
> > >>>> Branch:
> > >
> > >
> > >>>>      1.7.3-rc2
> > >
> > >
> > >
> > >
> > >>>> If this vote passes, a gpg-signed tag will be created using:
> > >
> > >
> > >>>>      git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3
> > >
> > >>>> 38d8a1d139eb21f0c9882be877db1b77aa1a45db
> > >
> > >
> > >
> > >
> > >>>> Staging repo:
> > >
> > >
> > >>   <
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > >
> > >> 065
> > >
> > >
> > >>>> Source (official release artifact):
> > >
> > >
> > >>   <
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > >
> > >> 065/or
> > >
> > >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz
> > >
> > >
> > >>>> Binary:
> > >
> > >
> > >>   <
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > >
> > >> 065/or
> > >
> > >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
> > >
> > >
> > >>>> (Append ".sha1", ".md5", or ".asc" to download the signature/hash
> > >
> > >>>> for a given artifact.)
> > >
> > >
> > >
> > >
> > >>>> All artifacts were built and staged with:
> > >
> > >
> > >>>>      mvn release:prepare&&  mvn release:perform
> > >
> > >
> > >
> > >
> > >>>> Signing keys are available at
> > >
> > >>>>   <https://www.apache.org/dist/accumulo/KEYS>
> > https://www.apache.org/dist/accumulo/KEYS
> > >
> > >
> > >>>> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)
> > >
> > >
> > >
> > >
> > >>>> Release notes (in progress) can be found at:
> > >
> > >>>>   <https://accumulo.apache.org/release_notes/1.7.3>
> > https://accumulo.apache.org/release_notes/1.7.3
> > >
> > >
> > >
> > >
> > >>>> Please vote one of:
> > >
> > >
> > >>>> [ ] +1 - I have verified and accept...
> > >
> > >
> > >>>> [ ] +0 - I have reservations, but not strong enough to vote
> against...
> > >
> > >
> > >>>> [ ] -1 - Because..., I do not accept...
> > >
> > >
> > >>>> ... these artifacts as the 1.7.3 release of Apache Accumulo.
> > >
> > >
> > >
> > >
> > >>>> This vote will end on Mon Mar 13 13:00:00 UTC 2017
> > >
> > >
> > >>>> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)
> > >
> > >
> > >
> > >
> > >>>> Thanks!
> > >
> > >
> > >
> > >
> > >>>> P.S. Hint: download the whole staging repo with
> > >
> > >
> > >>>>      wget -erobots=off -r -l inf -np -nH \
> > >
> > >
> > >
> > >
> > >>   <
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > >
> > >> 065/
> > >
> > >
> > >>>>      # note the trailing slash is needed
> > >
> > >
> > >
> > >
> >
>

Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Posted by Christopher <ct...@apache.org>.
On Sat, Mar 11, 2017 at 5:15 PM Josh Elser <jo...@gmail.com> wrote:

> Christopher,
>
> When I wrote that test, there were issues with the minimum functioning
> renewal period as provided by the embedded KDC from Kerby. That is why
> this test runs for so long -- anything shorter failed.
>
>
I understand that. There was a comment in the code to that effect.


> This test passed at one point. I don't run tests on my own hardware to
> catch regressions anymore after previous discussions with you on this
> matter.


I don't understand what you mean by this, or how it applies. I'm sure it
did pass at one point... and may still (hence my question to the group
asking whether they observed it passing).


> In the future, I'd suggest investing the time into investigating
> why the test actually failed instead of picking apart the test itself.
>
>
I did preliminary investigation, and forwarded my observations to the group
for further discussion. I even suggested a possible cause for the failure.
But I didn't think it would be productive to dig any deeper without first
raising what I found to the group for further discussion and feedback.

"picking apart the test itself" is also known as "reviewing code" and
"investigating". I think you're taking my criticism of the code personally,
and I'm not sure why. The fact is, I got as far as I could at 1AM on
Saturday, and informed the group of what I experienced, because I thought
it was relevant to the vote which expires on Monday morning. It seems that
you'd prefer I postpone my comments until I have some kind of "perfect
knowledge" of what went wrong with the test and how to fix it. Aside from
the fact that I knew that I wasn't going to have time before the vote
concluded on Monday, that makes no sense to me even under ideal
circumstances... if we all did that, why would we even have a group? We're
better when we rely on each other's expertise and knowledge, and discuss
problems (or potential problems) as a team. I would like to see this test
improved, but I knew that working on it in silence on my own was not going
to achieve that.


> Thanks.
>
> Ed Coleman wrote:
> > I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602
> that I often have trouble with this and a few others.
> >
> >
> >
> > Not sure it makes me feel any better, but for me, this is not "new" to
> 1.7.3. I thought it could be due my virtual-box development environment,
> but I've tried running verify on a AWS c4.2xlarge instance with the same
> intermittent results. I have had it pass, but more often than not it fails.
> >
> >
> >
> > To help decide if 1.7.3-rc0 could be a candidate, I made the following
> chart tracking IT issues – and then at one point the KerberosRenewall
> passed for me (and it passed a few times in a row) and I stopped updating
> the chart.:
> >
> >
> >
> >
> >
> > Instance Type
> >
> >
> > Test
> >
> > AWS1
> >
> > AWS2
> >
> > AW3
> >
> > OpenBox 1
> >
> > OpenBox 2
> >
> > OpenBox3
> >
> >
> > AssignmentThreadsIT.testConcurrentAssignmentPerformance:91
> >
> >
> >
> > x
> >
> > x
> >
> > x
> >
> >
> >
> >
> >
> >
> > BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 »
> TestTimedOut
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > x
> >
> >
> > ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 »
> TestTimedOut test...
> >
> >
> >
> > x
> >
> >
> >
> > x
> >
> >
> >
> >
> >
> >
> > ConditionalWriterIT.testTrace:1476 » TestTimedOut test timed out after
> 60 seco...
> >
> >
> >
> >
> >
> > x
> >
> >
> >
> >
> >
> >
> >
> >
> > DurabilityIT.testWriteSpeed:103 log should be faster than flush
> >
> > x
> >
> > x
> >
> > x
> >
> > x
> >
> >
> >
> >
> >
> >
> > FateStarvationIT.run:79 » Runtime java.lang.RuntimeException:
> org.apache.zooke...
> >
> >
> >
> >
> >
> > x
> >
> >
> >
> >
> >
> >
> >
> >
> > KerberosRenewalIT.testReadAndWriteThroughTicketLifetime » TestTimedOut
> test ti...
> >
> > x
> >
> > x
> >
> > x
> >
> > x
> >
> > x
> >
> > x
> >
> >
> > ShellServerIT.trace:1444
> >
> > x
> >
> >
> >
> > x
> >
> >
> >
> >
> >
> >
> >
> >
> > TabletStateChangeIteratorIT.test:100 No tables should need attention
> expected:<0>  but was:<1>
> >
> > x
> >
> >
> >
> >
> >
> >
> >
> > x
> >
> >
> >
> >
> >
> UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWithoutDrain:548
> » TableOffline
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > KerberosReplicationIT.dataReplicatedToCorrectTable:224 » TestTimedOut
> test tim...
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > x
> >
> >
> >
> > I am seeing the same intermittent failures with 1.7.3-rc1 and 1.7.3-rc2.
> >
> >
> >
> > -----Original Message-----
> > From: Christopher [mailto:ctubbsii@apache.org]
> > Sent: Saturday, March 11, 2017 1:53 AM
> > To: Accumulo Dev List<de...@accumulo.apache.org>
> > Subject: Re: [VOTE] Accumulo 1.7.3-rc2
> >
> >
> >
> > +1, reluctantly, due to KerberosRenewalIT failures described below.
> >
> >
> >
> > Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball
> contents/license stuffs/ITs.
> >
> >
> >
> > I could not get KerberosRenewalIT to pass at all (I tried half a dozen
> times). It keeps timing out. It looks like it's supposed to finish between
> >
> > 8 and 9 minutes... an insanely long time for a *single* test to be
> running, IMO, especially one as narrowly focused as this one
> (ShellServerIT, for example, runs about that long, but covers a very broad
> spectrum of Accumulo behavior). This test ignores the scaling parameter,
> too, so it cannot be scaled with the timeout.factor system property.
> >
> >
> >
> > The actual behavior of the test is to just create a table, put in data,
> scan it, then delete the table, every 5 seconds for 8 minutes minimum,
> under the assumption that the Kerberos ticket will expire at some point
> during that time period, and Accumulo will automatically renew it and
> continue functioning (the actual condition of expiration and renewal is
> never checked). This seems like something that should be mocked out on the
> object responsible for the detecting and handling the renewal, and not a
> >
> > 8-9 minute integration test. It's not even clear from the current test
> which code is responsible for that (e.g. which code this test is testing).
> >
> > The most recent failure timed out after 9 minutes trying to create an
> Accumulo table. This could indicate that there's a problem with the ticket
> not renewing when there's an expiration waiting for a FATE operation... or
> it could just be that's where the test happened to be when the 9 minutes
> were up.
> >
> >
> >
> > Is anybody else experiencing problems with this test?
> >
> >
> >
> > In spite of this failure, I'm willing to give my +1 anyway, since I'm
> inclined to think this is simply an unreliable test.
> >
> >
> >
> > On Fri, Mar 10, 2017 at 5:45 PM Keith Turner<  <ma...@deenlo.com>
> keith@deenlo.com>  wrote:
> >
> >
> >
> >> I also verified the rfile fix.
> >
> >
> >> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner<  <mailto:
> keith@deenlo.com>  keith@deenlo.com>  wrote:
> >
> >>> +1
> >
> >
> >>> Did the following :
> >
> >
> >>>   * Was able to build Fluo against jars in staging repo.
> >
> >>>   * Sigs checkout for tarballs
> >
> >>>   * No diffs between src tarball and rc2 branch
> >
> >>>   * Looked at diffs between rc1 and rc2
> >
> >
> >
> >>> On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman<  <mailto:
> dev1@etcoleman.com>  dev1@etcoleman.com>  wrote:
> >
> >>>> Accumulo Developers,
> >
> >
> >
> >
> >>>> Please consider the following candidate for Accumulo 1.7.3. This
> >
> >> candidate
> >
> >>>> contains two changes from 1.7.3-rc1:
> >
> >
> >
> >
> >>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4600>
> https://issues.apache.org/jira/browse/ACCUMULO-4600 -
> >
> >> shell does
> >
> >>>> not fall back to accumulo-site.xml when on classpath.
> >
> >
> >>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4597>
> https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE
> >
> >> from
> >
> >>>> RFile PrintInfo
> >
> >
> >
> >
> >>>> Git Commit:
> >
> >
> >>>>      38d8a1d139eb21f0c9882be877db1b77aa1a45db
> >
> >
> >>>> Branch:
> >
> >
> >>>>      1.7.3-rc2
> >
> >
> >
> >
> >>>> If this vote passes, a gpg-signed tag will be created using:
> >
> >
> >>>>      git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3
> >
> >>>> 38d8a1d139eb21f0c9882be877db1b77aa1a45db
> >
> >
> >
> >
> >>>> Staging repo:
> >
> >
> >>   <
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >
> >> 065
> >
> >
> >>>> Source (official release artifact):
> >
> >
> >>   <
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >
> >> 065/or
> >
> >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz
> >
> >
> >>>> Binary:
> >
> >
> >>   <
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >
> >> 065/or
> >
> >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
> >
> >
> >>>> (Append ".sha1", ".md5", or ".asc" to download the signature/hash
> >
> >>>> for a given artifact.)
> >
> >
> >
> >
> >>>> All artifacts were built and staged with:
> >
> >
> >>>>      mvn release:prepare&&  mvn release:perform
> >
> >
> >
> >
> >>>> Signing keys are available at
> >
> >>>>   <https://www.apache.org/dist/accumulo/KEYS>
> https://www.apache.org/dist/accumulo/KEYS
> >
> >
> >>>> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)
> >
> >
> >
> >
> >>>> Release notes (in progress) can be found at:
> >
> >>>>   <https://accumulo.apache.org/release_notes/1.7.3>
> https://accumulo.apache.org/release_notes/1.7.3
> >
> >
> >
> >
> >>>> Please vote one of:
> >
> >
> >>>> [ ] +1 - I have verified and accept...
> >
> >
> >>>> [ ] +0 - I have reservations, but not strong enough to vote against...
> >
> >
> >>>> [ ] -1 - Because..., I do not accept...
> >
> >
> >>>> ... these artifacts as the 1.7.3 release of Apache Accumulo.
> >
> >
> >
> >
> >>>> This vote will end on Mon Mar 13 13:00:00 UTC 2017
> >
> >
> >>>> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)
> >
> >
> >
> >
> >>>> Thanks!
> >
> >
> >
> >
> >>>> P.S. Hint: download the whole staging repo with
> >
> >
> >>>>      wget -erobots=off -r -l inf -np -nH \
> >
> >
> >
> >
> >>   <
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >
> >> 065/
> >
> >
> >>>>      # note the trailing slash is needed
> >
> >
> >
> >
>

Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Posted by Josh Elser <jo...@gmail.com>.
Christopher,

When I wrote that test, there were issues with the minimum functioning 
renewal period as provided by the embedded KDC from Kerby. That is why 
this test runs for so long -- anything shorter failed.

This test passed at one point. I don't run tests on my own hardware to 
catch regressions anymore after previous discussions with you on this 
matter. In the future, I'd suggest investing the time into investigating 
why the test actually failed instead of picking apart the test itself.

Thanks.

Ed Coleman wrote:
> I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602 that I often have trouble with this and a few others.
>
>
>
> Not sure it makes me feel any better, but for me, this is not "new" to 1.7.3. I thought it could be due my virtual-box development environment, but I've tried running verify on a AWS c4.2xlarge instance with the same intermittent results. I have had it pass, but more often than not it fails.
>
>
>
> To help decide if 1.7.3-rc0 could be a candidate, I made the following chart tracking IT issues \u2013 and then at one point the KerberosRenewall passed for me (and it passed a few times in a row) and I stopped updating the chart.:
>
>
>
> 							
> 	
> Instance Type
>
>
> Test
>
> AWS1
>
> AWS2
>
> AW3
>
> OpenBox 1
>
> OpenBox 2
>
> OpenBox3
>
>
> AssignmentThreadsIT.testConcurrentAssignmentPerformance:91
>
>
>
> x
>
> x
>
> x
>
>
>
>
>
>
> BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 � TestTimedOut
>
>
>
>
>
>
>
>
>
>
>
> x
>
>
> ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 � TestTimedOut test...
>
>
>
> x
>
>
>
> x
>
>
>
>
>
>
> ConditionalWriterIT.testTrace:1476 � TestTimedOut test timed out after 60 seco...
>
>
>
>
>
> x
>
>
>
>
>
>
>
>
> DurabilityIT.testWriteSpeed:103 log should be faster than flush
>
> x
>
> x
>
> x
>
> x
>
>
>
>
>
>
> FateStarvationIT.run:79 � Runtime java.lang.RuntimeException: org.apache.zooke...
>
>
>
>
>
> x
>
>
>
>
>
>
>
>
> KerberosRenewalIT.testReadAndWriteThroughTicketLifetime � TestTimedOut test ti...
>
> x
>
> x
>
> x
>
> x
>
> x
>
> x
>
>
> ShellServerIT.trace:1444
>
> x
>
>
>
> x
>
>
>
>
>
>
>
>
> TabletStateChangeIteratorIT.test:100 No tables should need attention expected:<0>  but was:<1>
>
> x
>
>
>
>
>
>
>
> x
>
>
>
>
> UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWithoutDrain:548 � TableOffline
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> KerberosReplicationIT.dataReplicatedToCorrectTable:224 � TestTimedOut test tim...
>
>
>
>
>
>
>
>
>
>
>
> x
>
>
>
> I am seeing the same intermittent failures with 1.7.3-rc1 and 1.7.3-rc2.
>
>
>
> -----Original Message-----
> From: Christopher [mailto:ctubbsii@apache.org]
> Sent: Saturday, March 11, 2017 1:53 AM
> To: Accumulo Dev List<de...@accumulo.apache.org>
> Subject: Re: [VOTE] Accumulo 1.7.3-rc2
>
>
>
> +1, reluctantly, due to KerberosRenewalIT failures described below.
>
>
>
> Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball contents/license stuffs/ITs.
>
>
>
> I could not get KerberosRenewalIT to pass at all (I tried half a dozen times). It keeps timing out. It looks like it's supposed to finish between
>
> 8 and 9 minutes... an insanely long time for a *single* test to be running, IMO, especially one as narrowly focused as this one (ShellServerIT, for example, runs about that long, but covers a very broad spectrum of Accumulo behavior). This test ignores the scaling parameter, too, so it cannot be scaled with the timeout.factor system property.
>
>
>
> The actual behavior of the test is to just create a table, put in data, scan it, then delete the table, every 5 seconds for 8 minutes minimum, under the assumption that the Kerberos ticket will expire at some point during that time period, and Accumulo will automatically renew it and continue functioning (the actual condition of expiration and renewal is never checked). This seems like something that should be mocked out on the object responsible for the detecting and handling the renewal, and not a
>
> 8-9 minute integration test. It's not even clear from the current test which code is responsible for that (e.g. which code this test is testing).
>
> The most recent failure timed out after 9 minutes trying to create an Accumulo table. This could indicate that there's a problem with the ticket not renewing when there's an expiration waiting for a FATE operation... or it could just be that's where the test happened to be when the 9 minutes were up.
>
>
>
> Is anybody else experiencing problems with this test?
>
>
>
> In spite of this failure, I'm willing to give my +1 anyway, since I'm inclined to think this is simply an unreliable test.
>
>
>
> On Fri, Mar 10, 2017 at 5:45 PM Keith Turner<  <ma...@deenlo.com>  keith@deenlo.com>  wrote:
>
>
>
>> I also verified the rfile fix.
>
>
>> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner<  <ma...@deenlo.com>  keith@deenlo.com>  wrote:
>
>>> +1
>
>
>>> Did the following :
>
>
>>>   * Was able to build Fluo against jars in staging repo.
>
>>>   * Sigs checkout for tarballs
>
>>>   * No diffs between src tarball and rc2 branch
>
>>>   * Looked at diffs between rc1 and rc2
>
>
>
>>> On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman<  <ma...@etcoleman.com>  dev1@etcoleman.com>  wrote:
>
>>>> Accumulo Developers,
>
>
>
>
>>>> Please consider the following candidate for Accumulo 1.7.3. This
>
>> candidate
>
>>>> contains two changes from 1.7.3-rc1:
>
>
>
>
>>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4600>  https://issues.apache.org/jira/browse/ACCUMULO-4600 -
>
>> shell does
>
>>>> not fall back to accumulo-site.xml when on classpath.
>
>
>>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4597>  https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE
>
>> from
>
>>>> RFile PrintInfo
>
>
>
>
>>>> Git Commit:
>
>
>>>>      38d8a1d139eb21f0c9882be877db1b77aa1a45db
>
>
>>>> Branch:
>
>
>>>>      1.7.3-rc2
>
>
>
>
>>>> If this vote passes, a gpg-signed tag will be created using:
>
>
>>>>      git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3
>
>>>> 38d8a1d139eb21f0c9882be877db1b77aa1a45db
>
>
>
>
>>>> Staging repo:
>
>
>>   <https://repository.apache.org/content/repositories/orgapacheaccumulo-1>  https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
>> 065
>
>
>>>> Source (official release artifact):
>
>
>>   <https://repository.apache.org/content/repositories/orgapacheaccumulo-1>  https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
>> 065/or
>
>>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz
>
>
>>>> Binary:
>
>
>>   <https://repository.apache.org/content/repositories/orgapacheaccumulo-1>  https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
>> 065/or
>
>>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
>
>
>>>> (Append ".sha1", ".md5", or ".asc" to download the signature/hash
>
>>>> for a given artifact.)
>
>
>
>
>>>> All artifacts were built and staged with:
>
>
>>>>      mvn release:prepare&&  mvn release:perform
>
>
>
>
>>>> Signing keys are available at
>
>>>>   <https://www.apache.org/dist/accumulo/KEYS>  https://www.apache.org/dist/accumulo/KEYS
>
>
>>>> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)
>
>
>
>
>>>> Release notes (in progress) can be found at:
>
>>>>   <https://accumulo.apache.org/release_notes/1.7.3>  https://accumulo.apache.org/release_notes/1.7.3
>
>
>
>
>>>> Please vote one of:
>
>
>>>> [ ] +1 - I have verified and accept...
>
>
>>>> [ ] +0 - I have reservations, but not strong enough to vote against...
>
>
>>>> [ ] -1 - Because..., I do not accept...
>
>
>>>> ... these artifacts as the 1.7.3 release of Apache Accumulo.
>
>
>
>
>>>> This vote will end on Mon Mar 13 13:00:00 UTC 2017
>
>
>>>> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)
>
>
>
>
>>>> Thanks!
>
>
>
>
>>>> P.S. Hint: download the whole staging repo with
>
>
>>>>      wget -erobots=off -r -l inf -np -nH \
>
>
>
>
>>   <https://repository.apache.org/content/repositories/orgapacheaccumulo-1>  https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
>> 065/
>
>
>>>>      # note the trailing slash is needed
>
>
>
>

RE: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Posted by Ed Coleman <de...@etcoleman.com>.
Sorry, the formatting seemed to get lost. It was a pretty chart - had colors and everything.

-----Original Message-----
From: Ed Coleman [mailto:dev1@etcoleman.com] 
Sent: Saturday, March 11, 2017 7:06 AM
To: dev@accumulo.apache.org
Subject: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602 that I often have trouble with this and a few others.

 

Not sure it makes me feel any better, but for me, this is not "new" to 1.7.3. I thought it could be due my virtual-box development environment, but I've tried running verify on a AWS c4.2xlarge instance with the same intermittent results. I have had it pass, but more often than not it fails.

 

To help decide if 1.7.3-rc0 could be a candidate, I made the following chart tracking IT issues – and then at one point the KerberosRenewall passed for me (and it passed a few times in a row) and I stopped updating the chart.:

 

							
	
Instance Type


Test

AWS1

AWS2

AW3

OpenBox 1

OpenBox 2

OpenBox3


AssignmentThreadsIT.testConcurrentAssignmentPerformance:91

 

x

x

x

 

 


BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 » TestTimedOut

 

 

 

 

 

x


ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 » TestTimedOut test...

 

x

 

x

 

 


ConditionalWriterIT.testTrace:1476 » TestTimedOut test timed out after 60 seco...

 

 

x

 

 

 


DurabilityIT.testWriteSpeed:103 log should be faster than flush

x

x

x

x

 

 


FateStarvationIT.run:79 » Runtime java.lang.RuntimeException: org.apache.zooke...

 

 

x

 

 

 


KerberosRenewalIT.testReadAndWriteThroughTicketLifetime » TestTimedOut test ti...

x

x

x

x

x

x


ShellServerIT.trace:1444

x

 

x

 

 

 


TabletStateChangeIteratorIT.test:100 No tables should need attention expected:<0> but was:<1>

x

 

 

 

x

 


UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWithoutDrain:548 » TableOffline

 

 

 

 

 

 


KerberosReplicationIT.dataReplicatedToCorrectTable:224 » TestTimedOut test tim...

 

 

 

 

 

x

 

I am seeing the same intermittent failures with 1.7.3-rc1 and 1.7.3-rc2.

 

-----Original Message-----
From: Christopher [mailto:ctubbsii@apache.org] 
Sent: Saturday, March 11, 2017 1:53 AM
To: Accumulo Dev List <de...@accumulo.apache.org>
Subject: Re: [VOTE] Accumulo 1.7.3-rc2

 

+1, reluctantly, due to KerberosRenewalIT failures described below.

 

Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball contents/license stuffs/ITs.

 

I could not get KerberosRenewalIT to pass at all (I tried half a dozen times). It keeps timing out. It looks like it's supposed to finish between

8 and 9 minutes... an insanely long time for a *single* test to be running, IMO, especially one as narrowly focused as this one (ShellServerIT, for example, runs about that long, but covers a very broad spectrum of Accumulo behavior). This test ignores the scaling parameter, too, so it cannot be scaled with the timeout.factor system property.

 

The actual behavior of the test is to just create a table, put in data, scan it, then delete the table, every 5 seconds for 8 minutes minimum, under the assumption that the Kerberos ticket will expire at some point during that time period, and Accumulo will automatically renew it and continue functioning (the actual condition of expiration and renewal is never checked). This seems like something that should be mocked out on the object responsible for the detecting and handling the renewal, and not a

8-9 minute integration test. It's not even clear from the current test which code is responsible for that (e.g. which code this test is testing).

The most recent failure timed out after 9 minutes trying to create an Accumulo table. This could indicate that there's a problem with the ticket not renewing when there's an expiration waiting for a FATE operation... or it could just be that's where the test happened to be when the 9 minutes were up.

 

Is anybody else experiencing problems with this test?

 

In spite of this failure, I'm willing to give my +1 anyway, since I'm inclined to think this is simply an unreliable test.

 

On Fri, Mar 10, 2017 at 5:45 PM Keith Turner < <ma...@deenlo.com> keith@deenlo.com> wrote:

 

> I also verified the rfile fix.

> 

> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner < <ma...@deenlo.com> keith@deenlo.com> wrote:

> > +1

> >

> > Did the following :

> >

> >  * Was able to build Fluo against jars in staging repo.

> >  * Sigs checkout for tarballs

> >  * No diffs between src tarball and rc2 branch

> >  * Looked at diffs between rc1 and rc2

> >

> >

> > On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman < <ma...@etcoleman.com> dev1@etcoleman.com> wrote:

> >> Accumulo Developers,

> >>

> >>

> >>

> >> Please consider the following candidate for Accumulo 1.7.3. This

> candidate

> >> contains two changes from 1.7.3-rc1:

> >>

> >>

> >>

> >> -           <https://issues.apache.org/jira/browse/ACCUMULO-4600> https://issues.apache.org/jira/browse/ACCUMULO-4600 -

> shell does

> >> not fall back to accumulo-site.xml when on classpath.

> >>

> >> -           <https://issues.apache.org/jira/browse/ACCUMULO-4597> https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE

> from

> >> RFile PrintInfo

> >>

> >>

> >>

> >> Git Commit:

> >>

> >>     38d8a1d139eb21f0c9882be877db1b77aa1a45db

> >>

> >> Branch:

> >>

> >>     1.7.3-rc2

> >>

> >>

> >>

> >> If this vote passes, a gpg-signed tag will be created using:

> >>

> >>     git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3 

> >> 38d8a1d139eb21f0c9882be877db1b77aa1a45db

> >>

> >>

> >>

> >> Staging repo:

> >>

>  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1> https://repository.apache.org/content/repositories/orgapacheaccumulo-1

> 065

> >>

> >> Source (official release artifact):

> >>

>  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1> https://repository.apache.org/content/repositories/orgapacheaccumulo-1

> 065/or

> >> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz

> >>

> >> Binary:

> >>

>  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1> https://repository.apache.org/content/repositories/orgapacheaccumulo-1

> 065/or

> >> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz

> >>

> >> (Append ".sha1", ".md5", or ".asc" to download the signature/hash 

> >> for a given artifact.)

> >>

> >>

> >>

> >> All artifacts were built and staged with:

> >>

> >>     mvn release:prepare && mvn release:perform

> >>

> >>

> >>

> >> Signing keys are available at 

> >>  <https://www.apache.org/dist/accumulo/KEYS> https://www.apache.org/dist/accumulo/KEYS

> >>

> >> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)

> >>

> >>

> >>

> >> Release notes (in progress) can be found at:

> >>  <https://accumulo.apache.org/release_notes/1.7.3> https://accumulo.apache.org/release_notes/1.7.3

> >>

> >>

> >>

> >> Please vote one of:

> >>

> >> [ ] +1 - I have verified and accept...

> >>

> >> [ ] +0 - I have reservations, but not strong enough to vote against...

> >>

> >> [ ] -1 - Because..., I do not accept...

> >>

> >> ... these artifacts as the 1.7.3 release of Apache Accumulo.

> >>

> >>

> >>

> >> This vote will end on Mon Mar 13 13:00:00 UTC 2017

> >>

> >> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)

> >>

> >>

> >>

> >> Thanks!

> >>

> >>

> >>

> >> P.S. Hint: download the whole staging repo with

> >>

> >>     wget -erobots=off -r -l inf -np -nH \

> >>

> >>

> >>

>  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1> https://repository.apache.org/content/repositories/orgapacheaccumulo-1

> 065/

> >>

> >>     # note the trailing slash is needed

> >>

> 



Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Posted by Christopher <ct...@apache.org>.
I agree with Josh. I think we may have discussed flexibility on voting time
before, and the consensus seemed to be "discretion of RM" to "call it" some
time after 72hrs. had passed, but not too long after.

On Mon, Mar 13, 2017 at 6:28 PM Josh Elser <jo...@gmail.com> wrote:

> IMO, you have agreement via the votes cast and the vote itself has
> lasted 72 hours. Ship it :)
>
> ASF rules state that the voting period should be at least 72 hours, that
> caveat isn't included well in our docs. You can leave the vote open for
> longer if you want, but, again, I don't think you should.
>
> Ed Coleman wrote:
> > I am inclined to close the vote as passed and press on with the other
> release artifacts.  Just need to read the rest of today's emails first in
> case there is something there that indicates we should not move forward
> with 1.7.3
> >
> > A few notes, mainly for future discussion / clarification on my part.
> (Can make these [discuss] subjects, if that would reach a wider / different
> audience.)
> >
> > When I read the rules for voting on the Accumulo release guide (
> http://accumulo.apache.org/contributor/making-release) - in the Vote
> paragraph, the text says that "Voting shall last 72 hours..." This lead me
> to believe that there was no flexibility to extend the time for the vote. I
> debated on pushing the release on Friday and having it run through the
> weekend - or waiting. On the plus side for a Friday AM release was that
> gave a full day on Friday and then the weekend in case anyone was in a
> position to run any of the tests, esp. the long running ones. On the minus,
> well, the weekend. If there was flexibility to extend the vote, say the
> text read "shall last at least 72 hours" - I would have delayed the vote
> close to maybe Tuesday or Wednesday AM.
> >
> > It will take me a while to get the release notes ready - if we can
> improve the tests before then, I have no issues with delaying the release
> and re-voting (if this is allowed). I'm also willing to push a 1.7.4 with
> any test changes soon - if that is the consensus of the community. I'd like
> to get this version released, but I also would like it to be as correct as
> possible - so whatever can facilitate meeting these two somewhat
> conflicting goals, +1 from me, and of course, whatever is the will of the
> group.
> >
> > Ed Coleman
> >
> > -----Original Message-----
> > From: Christopher [mailto:ctubbsii@apache.org]
> > Sent: Monday, March 13, 2017 5:49 PM
> > To: dev@accumulo.apache.org
> > Subject: Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2
> >
> > Okay Ed, glad to know I'm not the only one, but also glad to know that
> they all do pass sometimes.
> > It looks like the vote thread got enough +1s to pass (if you're ready to
> call it), and we can investigate these test failures further on that ticket
> Josh created, or any others which might be relevant.
> >
> > If I have time later this week, I'll try to track down what code is
> responsible for the actual renewal, and see why the KerberosRenewalIT might
> be failing. I'm not too optimistic I'll be able to find the problem (I
> don't have a high enough confidence in my understanding of this part of the
> code), but I'll try if I have time.
> >
> > On Sat, Mar 11, 2017 at 7:06 AM Ed Coleman<de...@etcoleman.com>  wrote:
> >
> >> I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602
> >> that I often have trouble with this and a few others.
> >>
> >>
> >>
> >> Not sure it makes me feel any better, but for me, this is not "new" to
> >> 1.7.3. I thought it could be due my virtual-box development
> >> environment, but I've tried running verify on a AWS c4.2xlarge
> >> instance with the same intermittent results. I have had it pass, but
> more often than not it fails.
> >>
> >>
> >>
> >> To help decide if 1.7.3-rc0 could be a candidate, I made the following
> >> chart tracking IT issues – and then at one point the KerberosRenewall
> >> passed for me (and it passed a few times in a row) and I stopped
> >> updating the chart.:
> >>
> >>
> >>
> >>
> >>
> >> Instance Type
> >>
> >>
> >> Test
> >>
> >> AWS1
> >>
> >> AWS2
> >>
> >> AW3
> >>
> >> OpenBox 1
> >>
> >> OpenBox 2
> >>
> >> OpenBox3
> >>
> >>
> >> AssignmentThreadsIT.testConcurrentAssignmentPerformance:91
> >>
> >>
> >>
> >> x
> >>
> >> x
> >>
> >> x
> >>
> >>
> >>
> >>
> >>
> >>
> >> BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 »
> >> TestTimedOut
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> x
> >>
> >>
> >> ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 »
> >> TestTimedOut test...
> >>
> >>
> >>
> >> x
> >>
> >>
> >>
> >> x
> >>
> >>
> >>
> >>
> >>
> >>
> >> ConditionalWriterIT.testTrace:1476 » TestTimedOut test timed out after
> >> 60 seco...
> >>
> >>
> >>
> >>
> >>
> >> x
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> DurabilityIT.testWriteSpeed:103 log should be faster than flush
> >>
> >> x
> >>
> >> x
> >>
> >> x
> >>
> >> x
> >>
> >>
> >>
> >>
> >>
> >>
> >> FateStarvationIT.run:79 » Runtime java.lang.RuntimeException:
> >> org.apache.zooke...
> >>
> >>
> >>
> >>
> >>
> >> x
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> KerberosRenewalIT.testReadAndWriteThroughTicketLifetime » TestTimedOut
> >> test ti...
> >>
> >> x
> >>
> >> x
> >>
> >> x
> >>
> >> x
> >>
> >> x
> >>
> >> x
> >>
> >>
> >> ShellServerIT.trace:1444
> >>
> >> x
> >>
> >>
> >>
> >> x
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> TabletStateChangeIteratorIT.test:100 No tables should need attention
> >> expected:<0>  but was:<1>
> >>
> >> x
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> x
> >>
> >>
> >>
> >>
> >> UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWithoutDr
> >> ain:548
> >> » TableOffline
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> KerberosReplicationIT.dataReplicatedToCorrectTable:224 » TestTimedOut
> >> test tim...
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> x
> >>
> >>
> >>
> >> I am seeing the same intermittent failures with 1.7.3-rc1 and 1.7.3-rc2.
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Christopher [mailto:ctubbsii@apache.org]
> >> Sent: Saturday, March 11, 2017 1:53 AM
> >> To: Accumulo Dev List<de...@accumulo.apache.org>
> >> Subject: Re: [VOTE] Accumulo 1.7.3-rc2
> >>
> >>
> >>
> >> +1, reluctantly, due to KerberosRenewalIT failures described below.
> >>
> >>
> >>
> >> Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball
> >> contents/license stuffs/ITs.
> >>
> >>
> >>
> >> I could not get KerberosRenewalIT to pass at all (I tried half a dozen
> >> times). It keeps timing out. It looks like it's supposed to finish
> >> between
> >>
> >> 8 and 9 minutes... an insanely long time for a *single* test to be
> >> running, IMO, especially one as narrowly focused as this one
> >> (ShellServerIT, for example, runs about that long, but covers a very
> >> broad spectrum of Accumulo behavior). This test ignores the scaling
> >> parameter, too, so it cannot be scaled with the timeout.factor system
> property.
> >>
> >>
> >>
> >> The actual behavior of the test is to just create a table, put in
> >> data, scan it, then delete the table, every 5 seconds for 8 minutes
> >> minimum, under the assumption that the Kerberos ticket will expire at
> >> some point during that time period, and Accumulo will automatically
> >> renew it and continue functioning (the actual condition of expiration
> >> and renewal is never checked). This seems like something that should
> >> be mocked out on the object responsible for the detecting and handling
> >> the renewal, and not a
> >>
> >> 8-9 minute integration test. It's not even clear from the current test
> >> which code is responsible for that (e.g. which code this test is
> testing).
> >>
> >> The most recent failure timed out after 9 minutes trying to create an
> >> Accumulo table. This could indicate that there's a problem with the
> >> ticket not renewing when there's an expiration waiting for a FATE
> >> operation... or it could just be that's where the test happened to be
> >> when the 9 minutes were up.
> >>
> >>
> >>
> >> Is anybody else experiencing problems with this test?
> >>
> >>
> >>
> >> In spite of this failure, I'm willing to give my +1 anyway, since I'm
> >> inclined to think this is simply an unreliable test.
> >>
> >>
> >>
> >> On Fri, Mar 10, 2017 at 5:45 PM Keith Turner<
> >> <ma...@deenlo.com>  keith@deenlo.com>  wrote:
> >>
> >>
> >>
> >>> I also verified the rfile fix.
> >>> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner<
> >>> <ma...@deenlo.com>
> >> keith@deenlo.com>  wrote:
> >>
> >>>> +1
> >>>> Did the following :
> >>>>   * Was able to build Fluo against jars in staging repo.
> >>>>   * Sigs checkout for tarballs
> >>>>   * No diffs between src tarball and rc2 branch
> >>>>   * Looked at diffs between rc1 and rc2
> >>>> On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman<  <mailto:
> >> dev1@etcoleman.com>  dev1@etcoleman.com>  wrote:
> >>
> >>>>> Accumulo Developers,
> >>>>> Please consider the following candidate for Accumulo 1.7.3. This
> >>> candidate
> >>>>> contains two changes from 1.7.3-rc1:
> >>>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4600>
> >> https://issues.apache.org/jira/browse/ACCUMULO-4600 -
> >>
> >>> shell does
> >>>>> not fall back to accumulo-site.xml when on classpath.
> >>>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4597>
> >> https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE
> >>
> >>> from
> >>>>> RFile PrintInfo
> >>>>> Git Commit:
> >>>>>      38d8a1d139eb21f0c9882be877db1b77aa1a45db
> >>>>> Branch:
> >>>>>      1.7.3-rc2
> >>>>> If this vote passes, a gpg-signed tag will be created using:
> >>>>>      git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3
> >>>>> 38d8a1d139eb21f0c9882be877db1b77aa1a45db
> >>>>> Staging repo:
> >>>
> >>> <https://repository.apache.org/content/repositories/orgapacheaccumul
> >>> o-1>
> >> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >>
> >>> 065
> >>>>> Source (official release artifact):
> >>>
> >>> <https://repository.apache.org/content/repositories/orgapacheaccumul
> >>> o-1>
> >> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >>
> >>> 065/or
> >>>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz
> >>>>> Binary:
> >>>
> >>> <https://repository.apache.org/content/repositories/orgapacheaccumul
> >>> o-1>
> >> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >>
> >>> 065/or
> >>>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
> >>>>> (Append ".sha1", ".md5", or ".asc" to download the signature/hash
> >>>>> for a given artifact.)
> >>>>> All artifacts were built and staged with:
> >>>>>      mvn release:prepare&&  mvn release:perform
> >>>>> Signing keys are available at
> >>>>>   <https://www.apache.org/dist/accumulo/KEYS>
> >> https://www.apache.org/dist/accumulo/KEYS
> >>
> >>>>> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)
> >>>>> Release notes (in progress) can be found at:
> >>>>>   <https://accumulo.apache.org/release_notes/1.7.3>
> >> https://accumulo.apache.org/release_notes/1.7.3
> >>
> >>>>> Please vote one of:
> >>>>> [ ] +1 - I have verified and accept...
> >>>>> [ ] +0 - I have reservations, but not strong enough to vote
> against...
> >>>>> [ ] -1 - Because..., I do not accept...
> >>>>> ... these artifacts as the 1.7.3 release of Apache Accumulo.
> >>>>> This vote will end on Mon Mar 13 13:00:00 UTC 2017
> >>>>> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)
> >>>>> Thanks!
> >>>>> P.S. Hint: download the whole staging repo with
> >>>>>      wget -erobots=off -r -l inf -np -nH \
> >>>
> >>> <https://repository.apache.org/content/repositories/orgapacheaccumul
> >>> o-1>
> >> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> >>
> >>> 065/
> >>>>>      # note the trailing slash is needed
> >>
> >
>

Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Posted by Josh Elser <jo...@gmail.com>.
IMO, you have agreement via the votes cast and the vote itself has 
lasted 72 hours. Ship it :)

ASF rules state that the voting period should be at least 72 hours, that 
caveat isn't included well in our docs. You can leave the vote open for 
longer if you want, but, again, I don't think you should.

Ed Coleman wrote:
> I am inclined to close the vote as passed and press on with the other release artifacts.  Just need to read the rest of today's emails first in case there is something there that indicates we should not move forward with 1.7.3
>
> A few notes, mainly for future discussion / clarification on my part. (Can make these [discuss] subjects, if that would reach a wider / different audience.)
>
> When I read the rules for voting on the Accumulo release guide (http://accumulo.apache.org/contributor/making-release) - in the Vote paragraph, the text says that "Voting shall last 72 hours..." This lead me to believe that there was no flexibility to extend the time for the vote. I debated on pushing the release on Friday and having it run through the weekend - or waiting. On the plus side for a Friday AM release was that gave a full day on Friday and then the weekend in case anyone was in a position to run any of the tests, esp. the long running ones. On the minus, well, the weekend. If there was flexibility to extend the vote, say the text read "shall last at least 72 hours" - I would have delayed the vote close to maybe Tuesday or Wednesday AM.
>
> It will take me a while to get the release notes ready - if we can improve the tests before then, I have no issues with delaying the release and re-voting (if this is allowed). I'm also willing to push a 1.7.4 with any test changes soon - if that is the consensus of the community. I'd like to get this version released, but I also would like it to be as correct as possible - so whatever can facilitate meeting these two somewhat conflicting goals, +1 from me, and of course, whatever is the will of the group.
>
> Ed Coleman
>
> -----Original Message-----
> From: Christopher [mailto:ctubbsii@apache.org]
> Sent: Monday, March 13, 2017 5:49 PM
> To: dev@accumulo.apache.org
> Subject: Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2
>
> Okay Ed, glad to know I'm not the only one, but also glad to know that they all do pass sometimes.
> It looks like the vote thread got enough +1s to pass (if you're ready to call it), and we can investigate these test failures further on that ticket Josh created, or any others which might be relevant.
>
> If I have time later this week, I'll try to track down what code is responsible for the actual renewal, and see why the KerberosRenewalIT might be failing. I'm not too optimistic I'll be able to find the problem (I don't have a high enough confidence in my understanding of this part of the code), but I'll try if I have time.
>
> On Sat, Mar 11, 2017 at 7:06 AM Ed Coleman<de...@etcoleman.com>  wrote:
>
>> I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602
>> that I often have trouble with this and a few others.
>>
>>
>>
>> Not sure it makes me feel any better, but for me, this is not "new" to
>> 1.7.3. I thought it could be due my virtual-box development
>> environment, but I've tried running verify on a AWS c4.2xlarge
>> instance with the same intermittent results. I have had it pass, but more often than not it fails.
>>
>>
>>
>> To help decide if 1.7.3-rc0 could be a candidate, I made the following
>> chart tracking IT issues \u2013 and then at one point the KerberosRenewall
>> passed for me (and it passed a few times in a row) and I stopped
>> updating the chart.:
>>
>>
>>
>>
>>
>> Instance Type
>>
>>
>> Test
>>
>> AWS1
>>
>> AWS2
>>
>> AW3
>>
>> OpenBox 1
>>
>> OpenBox 2
>>
>> OpenBox3
>>
>>
>> AssignmentThreadsIT.testConcurrentAssignmentPerformance:91
>>
>>
>>
>> x
>>
>> x
>>
>> x
>>
>>
>>
>>
>>
>>
>> BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 �
>> TestTimedOut
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> x
>>
>>
>> ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 �
>> TestTimedOut test...
>>
>>
>>
>> x
>>
>>
>>
>> x
>>
>>
>>
>>
>>
>>
>> ConditionalWriterIT.testTrace:1476 � TestTimedOut test timed out after
>> 60 seco...
>>
>>
>>
>>
>>
>> x
>>
>>
>>
>>
>>
>>
>>
>>
>> DurabilityIT.testWriteSpeed:103 log should be faster than flush
>>
>> x
>>
>> x
>>
>> x
>>
>> x
>>
>>
>>
>>
>>
>>
>> FateStarvationIT.run:79 � Runtime java.lang.RuntimeException:
>> org.apache.zooke...
>>
>>
>>
>>
>>
>> x
>>
>>
>>
>>
>>
>>
>>
>>
>> KerberosRenewalIT.testReadAndWriteThroughTicketLifetime � TestTimedOut
>> test ti...
>>
>> x
>>
>> x
>>
>> x
>>
>> x
>>
>> x
>>
>> x
>>
>>
>> ShellServerIT.trace:1444
>>
>> x
>>
>>
>>
>> x
>>
>>
>>
>>
>>
>>
>>
>>
>> TabletStateChangeIteratorIT.test:100 No tables should need attention
>> expected:<0>  but was:<1>
>>
>> x
>>
>>
>>
>>
>>
>>
>>
>> x
>>
>>
>>
>>
>> UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWithoutDr
>> ain:548
>> � TableOffline
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> KerberosReplicationIT.dataReplicatedToCorrectTable:224 � TestTimedOut
>> test tim...
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> x
>>
>>
>>
>> I am seeing the same intermittent failures with 1.7.3-rc1 and 1.7.3-rc2.
>>
>>
>>
>> -----Original Message-----
>> From: Christopher [mailto:ctubbsii@apache.org]
>> Sent: Saturday, March 11, 2017 1:53 AM
>> To: Accumulo Dev List<de...@accumulo.apache.org>
>> Subject: Re: [VOTE] Accumulo 1.7.3-rc2
>>
>>
>>
>> +1, reluctantly, due to KerberosRenewalIT failures described below.
>>
>>
>>
>> Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball
>> contents/license stuffs/ITs.
>>
>>
>>
>> I could not get KerberosRenewalIT to pass at all (I tried half a dozen
>> times). It keeps timing out. It looks like it's supposed to finish
>> between
>>
>> 8 and 9 minutes... an insanely long time for a *single* test to be
>> running, IMO, especially one as narrowly focused as this one
>> (ShellServerIT, for example, runs about that long, but covers a very
>> broad spectrum of Accumulo behavior). This test ignores the scaling
>> parameter, too, so it cannot be scaled with the timeout.factor system property.
>>
>>
>>
>> The actual behavior of the test is to just create a table, put in
>> data, scan it, then delete the table, every 5 seconds for 8 minutes
>> minimum, under the assumption that the Kerberos ticket will expire at
>> some point during that time period, and Accumulo will automatically
>> renew it and continue functioning (the actual condition of expiration
>> and renewal is never checked). This seems like something that should
>> be mocked out on the object responsible for the detecting and handling
>> the renewal, and not a
>>
>> 8-9 minute integration test. It's not even clear from the current test
>> which code is responsible for that (e.g. which code this test is testing).
>>
>> The most recent failure timed out after 9 minutes trying to create an
>> Accumulo table. This could indicate that there's a problem with the
>> ticket not renewing when there's an expiration waiting for a FATE
>> operation... or it could just be that's where the test happened to be
>> when the 9 minutes were up.
>>
>>
>>
>> Is anybody else experiencing problems with this test?
>>
>>
>>
>> In spite of this failure, I'm willing to give my +1 anyway, since I'm
>> inclined to think this is simply an unreliable test.
>>
>>
>>
>> On Fri, Mar 10, 2017 at 5:45 PM Keith Turner<
>> <ma...@deenlo.com>  keith@deenlo.com>  wrote:
>>
>>
>>
>>> I also verified the rfile fix.
>>> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner<
>>> <ma...@deenlo.com>
>> keith@deenlo.com>  wrote:
>>
>>>> +1
>>>> Did the following :
>>>>   * Was able to build Fluo against jars in staging repo.
>>>>   * Sigs checkout for tarballs
>>>>   * No diffs between src tarball and rc2 branch
>>>>   * Looked at diffs between rc1 and rc2
>>>> On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman<  <mailto:
>> dev1@etcoleman.com>  dev1@etcoleman.com>  wrote:
>>
>>>>> Accumulo Developers,
>>>>> Please consider the following candidate for Accumulo 1.7.3. This
>>> candidate
>>>>> contains two changes from 1.7.3-rc1:
>>>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4600>
>> https://issues.apache.org/jira/browse/ACCUMULO-4600 -
>>
>>> shell does
>>>>> not fall back to accumulo-site.xml when on classpath.
>>>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4597>
>> https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE
>>
>>> from
>>>>> RFile PrintInfo
>>>>> Git Commit:
>>>>>      38d8a1d139eb21f0c9882be877db1b77aa1a45db
>>>>> Branch:
>>>>>      1.7.3-rc2
>>>>> If this vote passes, a gpg-signed tag will be created using:
>>>>>      git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3
>>>>> 38d8a1d139eb21f0c9882be877db1b77aa1a45db
>>>>> Staging repo:
>>>
>>> <https://repository.apache.org/content/repositories/orgapacheaccumul
>>> o-1>
>> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>>
>>> 065
>>>>> Source (official release artifact):
>>>
>>> <https://repository.apache.org/content/repositories/orgapacheaccumul
>>> o-1>
>> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>>
>>> 065/or
>>>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz
>>>>> Binary:
>>>
>>> <https://repository.apache.org/content/repositories/orgapacheaccumul
>>> o-1>
>> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>>
>>> 065/or
>>>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
>>>>> (Append ".sha1", ".md5", or ".asc" to download the signature/hash
>>>>> for a given artifact.)
>>>>> All artifacts were built and staged with:
>>>>>      mvn release:prepare&&  mvn release:perform
>>>>> Signing keys are available at
>>>>>   <https://www.apache.org/dist/accumulo/KEYS>
>> https://www.apache.org/dist/accumulo/KEYS
>>
>>>>> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)
>>>>> Release notes (in progress) can be found at:
>>>>>   <https://accumulo.apache.org/release_notes/1.7.3>
>> https://accumulo.apache.org/release_notes/1.7.3
>>
>>>>> Please vote one of:
>>>>> [ ] +1 - I have verified and accept...
>>>>> [ ] +0 - I have reservations, but not strong enough to vote against...
>>>>> [ ] -1 - Because..., I do not accept...
>>>>> ... these artifacts as the 1.7.3 release of Apache Accumulo.
>>>>> This vote will end on Mon Mar 13 13:00:00 UTC 2017
>>>>> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)
>>>>> Thanks!
>>>>> P.S. Hint: download the whole staging repo with
>>>>>      wget -erobots=off -r -l inf -np -nH \
>>>
>>> <https://repository.apache.org/content/repositories/orgapacheaccumul
>>> o-1>
>> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>>
>>> 065/
>>>>>      # note the trailing slash is needed
>>
>

RE: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Posted by Ed Coleman <de...@etcoleman.com>.
I am inclined to close the vote as passed and press on with the other release artifacts.  Just need to read the rest of today's emails first in case there is something there that indicates we should not move forward with 1.7.3

A few notes, mainly for future discussion / clarification on my part. (Can make these [discuss] subjects, if that would reach a wider / different audience.)

When I read the rules for voting on the Accumulo release guide (http://accumulo.apache.org/contributor/making-release) - in the Vote paragraph, the text says that "Voting shall last 72 hours..." This lead me to believe that there was no flexibility to extend the time for the vote. I debated on pushing the release on Friday and having it run through the weekend - or waiting. On the plus side for a Friday AM release was that gave a full day on Friday and then the weekend in case anyone was in a position to run any of the tests, esp. the long running ones. On the minus, well, the weekend. If there was flexibility to extend the vote, say the text read "shall last at least 72 hours" - I would have delayed the vote close to maybe Tuesday or Wednesday AM.

It will take me a while to get the release notes ready - if we can improve the tests before then, I have no issues with delaying the release and re-voting (if this is allowed). I'm also willing to push a 1.7.4 with any test changes soon - if that is the consensus of the community. I'd like to get this version released, but I also would like it to be as correct as possible - so whatever can facilitate meeting these two somewhat conflicting goals, +1 from me, and of course, whatever is the will of the group.

Ed Coleman

-----Original Message-----
From: Christopher [mailto:ctubbsii@apache.org] 
Sent: Monday, March 13, 2017 5:49 PM
To: dev@accumulo.apache.org
Subject: Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Okay Ed, glad to know I'm not the only one, but also glad to know that they all do pass sometimes.
It looks like the vote thread got enough +1s to pass (if you're ready to call it), and we can investigate these test failures further on that ticket Josh created, or any others which might be relevant.

If I have time later this week, I'll try to track down what code is responsible for the actual renewal, and see why the KerberosRenewalIT might be failing. I'm not too optimistic I'll be able to find the problem (I don't have a high enough confidence in my understanding of this part of the code), but I'll try if I have time.

On Sat, Mar 11, 2017 at 7:06 AM Ed Coleman <de...@etcoleman.com> wrote:

> I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602
> that I often have trouble with this and a few others.
>
>
>
> Not sure it makes me feel any better, but for me, this is not "new" to 
> 1.7.3. I thought it could be due my virtual-box development 
> environment, but I've tried running verify on a AWS c4.2xlarge 
> instance with the same intermittent results. I have had it pass, but more often than not it fails.
>
>
>
> To help decide if 1.7.3-rc0 could be a candidate, I made the following 
> chart tracking IT issues – and then at one point the KerberosRenewall 
> passed for me (and it passed a few times in a row) and I stopped 
> updating the chart.:
>
>
>
>
>
> Instance Type
>
>
> Test
>
> AWS1
>
> AWS2
>
> AW3
>
> OpenBox 1
>
> OpenBox 2
>
> OpenBox3
>
>
> AssignmentThreadsIT.testConcurrentAssignmentPerformance:91
>
>
>
> x
>
> x
>
> x
>
>
>
>
>
>
> BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 »
> TestTimedOut
>
>
>
>
>
>
>
>
>
>
>
> x
>
>
> ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 » 
> TestTimedOut test...
>
>
>
> x
>
>
>
> x
>
>
>
>
>
>
> ConditionalWriterIT.testTrace:1476 » TestTimedOut test timed out after 
> 60 seco...
>
>
>
>
>
> x
>
>
>
>
>
>
>
>
> DurabilityIT.testWriteSpeed:103 log should be faster than flush
>
> x
>
> x
>
> x
>
> x
>
>
>
>
>
>
> FateStarvationIT.run:79 » Runtime java.lang.RuntimeException:
> org.apache.zooke...
>
>
>
>
>
> x
>
>
>
>
>
>
>
>
> KerberosRenewalIT.testReadAndWriteThroughTicketLifetime » TestTimedOut 
> test ti...
>
> x
>
> x
>
> x
>
> x
>
> x
>
> x
>
>
> ShellServerIT.trace:1444
>
> x
>
>
>
> x
>
>
>
>
>
>
>
>
> TabletStateChangeIteratorIT.test:100 No tables should need attention 
> expected:<0> but was:<1>
>
> x
>
>
>
>
>
>
>
> x
>
>
>
>
> UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWithoutDr
> ain:548
> » TableOffline
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> KerberosReplicationIT.dataReplicatedToCorrectTable:224 » TestTimedOut 
> test tim...
>
>
>
>
>
>
>
>
>
>
>
> x
>
>
>
> I am seeing the same intermittent failures with 1.7.3-rc1 and 1.7.3-rc2.
>
>
>
> -----Original Message-----
> From: Christopher [mailto:ctubbsii@apache.org]
> Sent: Saturday, March 11, 2017 1:53 AM
> To: Accumulo Dev List <de...@accumulo.apache.org>
> Subject: Re: [VOTE] Accumulo 1.7.3-rc2
>
>
>
> +1, reluctantly, due to KerberosRenewalIT failures described below.
>
>
>
> Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball 
> contents/license stuffs/ITs.
>
>
>
> I could not get KerberosRenewalIT to pass at all (I tried half a dozen 
> times). It keeps timing out. It looks like it's supposed to finish 
> between
>
> 8 and 9 minutes... an insanely long time for a *single* test to be 
> running, IMO, especially one as narrowly focused as this one 
> (ShellServerIT, for example, runs about that long, but covers a very 
> broad spectrum of Accumulo behavior). This test ignores the scaling 
> parameter, too, so it cannot be scaled with the timeout.factor system property.
>
>
>
> The actual behavior of the test is to just create a table, put in 
> data, scan it, then delete the table, every 5 seconds for 8 minutes 
> minimum, under the assumption that the Kerberos ticket will expire at 
> some point during that time period, and Accumulo will automatically 
> renew it and continue functioning (the actual condition of expiration 
> and renewal is never checked). This seems like something that should 
> be mocked out on the object responsible for the detecting and handling 
> the renewal, and not a
>
> 8-9 minute integration test. It's not even clear from the current test 
> which code is responsible for that (e.g. which code this test is testing).
>
> The most recent failure timed out after 9 minutes trying to create an 
> Accumulo table. This could indicate that there's a problem with the 
> ticket not renewing when there's an expiration waiting for a FATE 
> operation... or it could just be that's where the test happened to be 
> when the 9 minutes were up.
>
>
>
> Is anybody else experiencing problems with this test?
>
>
>
> In spite of this failure, I'm willing to give my +1 anyway, since I'm 
> inclined to think this is simply an unreliable test.
>
>
>
> On Fri, Mar 10, 2017 at 5:45 PM Keith Turner < 
> <ma...@deenlo.com> keith@deenlo.com> wrote:
>
>
>
> > I also verified the rfile fix.
>
> >
>
> > On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner < 
> > <ma...@deenlo.com>
> keith@deenlo.com> wrote:
>
> > > +1
>
> > >
>
> > > Did the following :
>
> > >
>
> > >  * Was able to build Fluo against jars in staging repo.
>
> > >  * Sigs checkout for tarballs
>
> > >  * No diffs between src tarball and rc2 branch
>
> > >  * Looked at diffs between rc1 and rc2
>
> > >
>
> > >
>
> > > On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman < <mailto:
> dev1@etcoleman.com> dev1@etcoleman.com> wrote:
>
> > >> Accumulo Developers,
>
> > >>
>
> > >>
>
> > >>
>
> > >> Please consider the following candidate for Accumulo 1.7.3. This
>
> > candidate
>
> > >> contains two changes from 1.7.3-rc1:
>
> > >>
>
> > >>
>
> > >>
>
> > >> -           <https://issues.apache.org/jira/browse/ACCUMULO-4600>
> https://issues.apache.org/jira/browse/ACCUMULO-4600 -
>
> > shell does
>
> > >> not fall back to accumulo-site.xml when on classpath.
>
> > >>
>
> > >> -           <https://issues.apache.org/jira/browse/ACCUMULO-4597>
> https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE
>
> > from
>
> > >> RFile PrintInfo
>
> > >>
>
> > >>
>
> > >>
>
> > >> Git Commit:
>
> > >>
>
> > >>     38d8a1d139eb21f0c9882be877db1b77aa1a45db
>
> > >>
>
> > >> Branch:
>
> > >>
>
> > >>     1.7.3-rc2
>
> > >>
>
> > >>
>
> > >>
>
> > >> If this vote passes, a gpg-signed tag will be created using:
>
> > >>
>
> > >>     git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3
>
> > >> 38d8a1d139eb21f0c9882be877db1b77aa1a45db
>
> > >>
>
> > >>
>
> > >>
>
> > >> Staging repo:
>
> > >>
>
> >  
> > <https://repository.apache.org/content/repositories/orgapacheaccumul
> > o-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
> > 065
>
> > >>
>
> > >> Source (official release artifact):
>
> > >>
>
> >  
> > <https://repository.apache.org/content/repositories/orgapacheaccumul
> > o-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
> > 065/or
>
> > >> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz
>
> > >>
>
> > >> Binary:
>
> > >>
>
> >  
> > <https://repository.apache.org/content/repositories/orgapacheaccumul
> > o-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
> > 065/or
>
> > >> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
>
> > >>
>
> > >> (Append ".sha1", ".md5", or ".asc" to download the signature/hash
>
> > >> for a given artifact.)
>
> > >>
>
> > >>
>
> > >>
>
> > >> All artifacts were built and staged with:
>
> > >>
>
> > >>     mvn release:prepare && mvn release:perform
>
> > >>
>
> > >>
>
> > >>
>
> > >> Signing keys are available at
>
> > >>  <https://www.apache.org/dist/accumulo/KEYS>
> https://www.apache.org/dist/accumulo/KEYS
>
> > >>
>
> > >> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)
>
> > >>
>
> > >>
>
> > >>
>
> > >> Release notes (in progress) can be found at:
>
> > >>  <https://accumulo.apache.org/release_notes/1.7.3>
> https://accumulo.apache.org/release_notes/1.7.3
>
> > >>
>
> > >>
>
> > >>
>
> > >> Please vote one of:
>
> > >>
>
> > >> [ ] +1 - I have verified and accept...
>
> > >>
>
> > >> [ ] +0 - I have reservations, but not strong enough to vote against...
>
> > >>
>
> > >> [ ] -1 - Because..., I do not accept...
>
> > >>
>
> > >> ... these artifacts as the 1.7.3 release of Apache Accumulo.
>
> > >>
>
> > >>
>
> > >>
>
> > >> This vote will end on Mon Mar 13 13:00:00 UTC 2017
>
> > >>
>
> > >> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)
>
> > >>
>
> > >>
>
> > >>
>
> > >> Thanks!
>
> > >>
>
> > >>
>
> > >>
>
> > >> P.S. Hint: download the whole staging repo with
>
> > >>
>
> > >>     wget -erobots=off -r -l inf -np -nH \
>
> > >>
>
> > >>
>
> > >>
>
> >  
> > <https://repository.apache.org/content/repositories/orgapacheaccumul
> > o-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
> > 065/
>
> > >>
>
> > >>     # note the trailing slash is needed
>
> > >>
>
> >
>
>


Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2

Posted by Christopher <ct...@apache.org>.
Okay Ed, glad to know I'm not the only one, but also glad to know that they
all do pass sometimes.
It looks like the vote thread got enough +1s to pass (if you're ready to
call it), and we can investigate these test failures further on that ticket
Josh created, or any others which might be relevant.

If I have time later this week, I'll try to track down what code is
responsible for the actual renewal, and see why the KerberosRenewalIT might
be failing. I'm not too optimistic I'll be able to find the problem (I
don't have a high enough confidence in my understanding of this part of the
code), but I'll try if I have time.

On Sat, Mar 11, 2017 at 7:06 AM Ed Coleman <de...@etcoleman.com> wrote:

> I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602
> that I often have trouble with this and a few others.
>
>
>
> Not sure it makes me feel any better, but for me, this is not "new" to
> 1.7.3. I thought it could be due my virtual-box development environment,
> but I've tried running verify on a AWS c4.2xlarge instance with the same
> intermittent results. I have had it pass, but more often than not it fails.
>
>
>
> To help decide if 1.7.3-rc0 could be a candidate, I made the following
> chart tracking IT issues – and then at one point the KerberosRenewall
> passed for me (and it passed a few times in a row) and I stopped updating
> the chart.:
>
>
>
>
>
> Instance Type
>
>
> Test
>
> AWS1
>
> AWS2
>
> AW3
>
> OpenBox 1
>
> OpenBox 2
>
> OpenBox3
>
>
> AssignmentThreadsIT.testConcurrentAssignmentPerformance:91
>
>
>
> x
>
> x
>
> x
>
>
>
>
>
>
> BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 »
> TestTimedOut
>
>
>
>
>
>
>
>
>
>
>
> x
>
>
> ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 » TestTimedOut
> test...
>
>
>
> x
>
>
>
> x
>
>
>
>
>
>
> ConditionalWriterIT.testTrace:1476 » TestTimedOut test timed out after 60
> seco...
>
>
>
>
>
> x
>
>
>
>
>
>
>
>
> DurabilityIT.testWriteSpeed:103 log should be faster than flush
>
> x
>
> x
>
> x
>
> x
>
>
>
>
>
>
> FateStarvationIT.run:79 » Runtime java.lang.RuntimeException:
> org.apache.zooke...
>
>
>
>
>
> x
>
>
>
>
>
>
>
>
> KerberosRenewalIT.testReadAndWriteThroughTicketLifetime » TestTimedOut
> test ti...
>
> x
>
> x
>
> x
>
> x
>
> x
>
> x
>
>
> ShellServerIT.trace:1444
>
> x
>
>
>
> x
>
>
>
>
>
>
>
>
> TabletStateChangeIteratorIT.test:100 No tables should need attention
> expected:<0> but was:<1>
>
> x
>
>
>
>
>
>
>
> x
>
>
>
>
> UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWithoutDrain:548
> » TableOffline
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> KerberosReplicationIT.dataReplicatedToCorrectTable:224 » TestTimedOut test
> tim...
>
>
>
>
>
>
>
>
>
>
>
> x
>
>
>
> I am seeing the same intermittent failures with 1.7.3-rc1 and 1.7.3-rc2.
>
>
>
> -----Original Message-----
> From: Christopher [mailto:ctubbsii@apache.org]
> Sent: Saturday, March 11, 2017 1:53 AM
> To: Accumulo Dev List <de...@accumulo.apache.org>
> Subject: Re: [VOTE] Accumulo 1.7.3-rc2
>
>
>
> +1, reluctantly, due to KerberosRenewalIT failures described below.
>
>
>
> Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball
> contents/license stuffs/ITs.
>
>
>
> I could not get KerberosRenewalIT to pass at all (I tried half a dozen
> times). It keeps timing out. It looks like it's supposed to finish between
>
> 8 and 9 minutes... an insanely long time for a *single* test to be
> running, IMO, especially one as narrowly focused as this one
> (ShellServerIT, for example, runs about that long, but covers a very broad
> spectrum of Accumulo behavior). This test ignores the scaling parameter,
> too, so it cannot be scaled with the timeout.factor system property.
>
>
>
> The actual behavior of the test is to just create a table, put in data,
> scan it, then delete the table, every 5 seconds for 8 minutes minimum,
> under the assumption that the Kerberos ticket will expire at some point
> during that time period, and Accumulo will automatically renew it and
> continue functioning (the actual condition of expiration and renewal is
> never checked). This seems like something that should be mocked out on the
> object responsible for the detecting and handling the renewal, and not a
>
> 8-9 minute integration test. It's not even clear from the current test
> which code is responsible for that (e.g. which code this test is testing).
>
> The most recent failure timed out after 9 minutes trying to create an
> Accumulo table. This could indicate that there's a problem with the ticket
> not renewing when there's an expiration waiting for a FATE operation... or
> it could just be that's where the test happened to be when the 9 minutes
> were up.
>
>
>
> Is anybody else experiencing problems with this test?
>
>
>
> In spite of this failure, I'm willing to give my +1 anyway, since I'm
> inclined to think this is simply an unreliable test.
>
>
>
> On Fri, Mar 10, 2017 at 5:45 PM Keith Turner < <ma...@deenlo.com>
> keith@deenlo.com> wrote:
>
>
>
> > I also verified the rfile fix.
>
> >
>
> > On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner < <ma...@deenlo.com>
> keith@deenlo.com> wrote:
>
> > > +1
>
> > >
>
> > > Did the following :
>
> > >
>
> > >  * Was able to build Fluo against jars in staging repo.
>
> > >  * Sigs checkout for tarballs
>
> > >  * No diffs between src tarball and rc2 branch
>
> > >  * Looked at diffs between rc1 and rc2
>
> > >
>
> > >
>
> > > On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman < <mailto:
> dev1@etcoleman.com> dev1@etcoleman.com> wrote:
>
> > >> Accumulo Developers,
>
> > >>
>
> > >>
>
> > >>
>
> > >> Please consider the following candidate for Accumulo 1.7.3. This
>
> > candidate
>
> > >> contains two changes from 1.7.3-rc1:
>
> > >>
>
> > >>
>
> > >>
>
> > >> -           <https://issues.apache.org/jira/browse/ACCUMULO-4600>
> https://issues.apache.org/jira/browse/ACCUMULO-4600 -
>
> > shell does
>
> > >> not fall back to accumulo-site.xml when on classpath.
>
> > >>
>
> > >> -           <https://issues.apache.org/jira/browse/ACCUMULO-4597>
> https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE
>
> > from
>
> > >> RFile PrintInfo
>
> > >>
>
> > >>
>
> > >>
>
> > >> Git Commit:
>
> > >>
>
> > >>     38d8a1d139eb21f0c9882be877db1b77aa1a45db
>
> > >>
>
> > >> Branch:
>
> > >>
>
> > >>     1.7.3-rc2
>
> > >>
>
> > >>
>
> > >>
>
> > >> If this vote passes, a gpg-signed tag will be created using:
>
> > >>
>
> > >>     git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3
>
> > >> 38d8a1d139eb21f0c9882be877db1b77aa1a45db
>
> > >>
>
> > >>
>
> > >>
>
> > >> Staging repo:
>
> > >>
>
> >  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
> > 065
>
> > >>
>
> > >> Source (official release artifact):
>
> > >>
>
> >  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
> > 065/or
>
> > >> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz
>
> > >>
>
> > >> Binary:
>
> > >>
>
> >  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
> > 065/or
>
> > >> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
>
> > >>
>
> > >> (Append ".sha1", ".md5", or ".asc" to download the signature/hash
>
> > >> for a given artifact.)
>
> > >>
>
> > >>
>
> > >>
>
> > >> All artifacts were built and staged with:
>
> > >>
>
> > >>     mvn release:prepare && mvn release:perform
>
> > >>
>
> > >>
>
> > >>
>
> > >> Signing keys are available at
>
> > >>  <https://www.apache.org/dist/accumulo/KEYS>
> https://www.apache.org/dist/accumulo/KEYS
>
> > >>
>
> > >> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)
>
> > >>
>
> > >>
>
> > >>
>
> > >> Release notes (in progress) can be found at:
>
> > >>  <https://accumulo.apache.org/release_notes/1.7.3>
> https://accumulo.apache.org/release_notes/1.7.3
>
> > >>
>
> > >>
>
> > >>
>
> > >> Please vote one of:
>
> > >>
>
> > >> [ ] +1 - I have verified and accept...
>
> > >>
>
> > >> [ ] +0 - I have reservations, but not strong enough to vote against...
>
> > >>
>
> > >> [ ] -1 - Because..., I do not accept...
>
> > >>
>
> > >> ... these artifacts as the 1.7.3 release of Apache Accumulo.
>
> > >>
>
> > >>
>
> > >>
>
> > >> This vote will end on Mon Mar 13 13:00:00 UTC 2017
>
> > >>
>
> > >> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)
>
> > >>
>
> > >>
>
> > >>
>
> > >> Thanks!
>
> > >>
>
> > >>
>
> > >>
>
> > >> P.S. Hint: download the whole staging repo with
>
> > >>
>
> > >>     wget -erobots=off -r -l inf -np -nH \
>
> > >>
>
> > >>
>
> > >>
>
> >  <https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1
>
> > 065/
>
> > >>
>
> > >>     # note the trailing slash is needed
>
> > >>
>
> >
>
>