You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Sean Busbey <se...@manvsbeard.com> on 2014/03/28 22:22:53 UTC

Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/
-----------------------------------------------------------

Review request for accumulo and kturner.


Bugs: ACCUMULO-2519
    https://issues.apache.org/jira/browse/ACCUMULO-2519


Repository: accumulo


Description
-------

Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.


Diffs
-----

  README 115a9b7 
  server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
  server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
  server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 

Diff: https://reviews.apache.org/r/19804/diff/


Testing
-------

Took a 1.4.5-SNAP cluster

* triggered compactions
* shutdown cluster
* verified waiting transactions
* verified waiting local WALs
* verified /accumulo/version showed 4
* Start upgrade to 1.5.2-SNAP
* verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
* verified waiting transactions
* verified waiting local WALs
* verified /accumulo/version showed 4
* Cleared Fate operations
* Start upgrade to 1.5.2-SNAP
* verify no errors shown for upgrade
* verified WALs copied to HDFS
* verified /accumulo/version showed 5
* verified monitor showed normal start up

Running verify job on existing data now. should take ~6 hours. 


Thanks,

Sean Busbey


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.

> On March 31, 2014, 4:33 p.m., Bill Havanki wrote:
> > server/src/main/java/org/apache/accumulo/server/Accumulo.java, line 283
> > <https://reviews.apache.org/r/19804/diff/1/?file=539925#file539925line283>
> >
> >     Testability: avoid HdfsZooInstance.getInstance(), maybe have instance passed in to method?
> 
> Sean Busbey wrote:
>     I'd rather not muck with the testability of the extant fate stuff as a part of making upgrades work.
> 
> Mike Drob wrote:
>     This whole method being static doesn't sit well with me. I think I would be much happier if it returned a boolean instead of throwing an exception, too.

It doesn't throw an exception, it exits. the internal exception is to limit the exit handling and logging to a single location, because the transaction store can throw a couple of exceptions. If those happen we also need to exit.

Returning a boolean would mean adding logging and exit calls in both places this method gets called. What would be the upside?


- Sean


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39056
-----------------------------------------------------------


On April 2, 2014, 6:06 a.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 6:06 a.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.

> On March 31, 2014, 4:33 p.m., Bill Havanki wrote:
> > server/src/main/java/org/apache/accumulo/server/Accumulo.java, line 283
> > <https://reviews.apache.org/r/19804/diff/1/?file=539925#file539925line283>
> >
> >     Testability: avoid HdfsZooInstance.getInstance(), maybe have instance passed in to method?

I'd rather not muck with the testability of the extant fate stuff as a part of making upgrades work.


- Sean


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39056
-----------------------------------------------------------


On March 28, 2014, 9:22 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 9:22 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * triggered compactions
> * shutdown cluster
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> 
> Running verify job on existing data now. should take ~6 hours. 
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Mike Drob <md...@mdrob.com>.

> On March 31, 2014, 4:33 p.m., Bill Havanki wrote:
> > server/src/main/java/org/apache/accumulo/server/Accumulo.java, line 283
> > <https://reviews.apache.org/r/19804/diff/1/?file=539925#file539925line283>
> >
> >     Testability: avoid HdfsZooInstance.getInstance(), maybe have instance passed in to method?
> 
> Sean Busbey wrote:
>     I'd rather not muck with the testability of the extant fate stuff as a part of making upgrades work.

This whole method being static doesn't sit well with me. I think I would be much happier if it returned a boolean instead of throwing an exception, too.


- Mike


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39056
-----------------------------------------------------------


On April 2, 2014, 6:06 a.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 6:06 a.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Bill Havanki <bh...@clouderagovt.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39056
-----------------------------------------------------------



server/src/main/java/org/apache/accumulo/server/Accumulo.java
<https://reviews.apache.org/r/19804/#comment71442>

    Testability: avoid HdfsZooInstance.getInstance(), maybe have instance passed in to method?


- Bill Havanki


On March 28, 2014, 5:22 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 5:22 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * triggered compactions
> * shutdown cluster
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> 
> Running verify job on existing data now. should take ~6 hours. 
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.

> On March 29, 2014, 12:26 a.m., kturner wrote:
> > server/src/main/java/org/apache/accumulo/server/master/Master.java, line 313
> > <https://reviews.apache.org/r/19804/diff/1/?file=539926#file539926line313>
> >
> >     I think this check can cause problems. Master.run()  starts StatusThread, StatusThread.run() will indirectly call upgradeMetadata().  After Master.run() starts StatusThread, it seems like it will start Fate and the client service.  So its possible that a 1.5 client could submit a fate op before the upgradeMetadata() is called. 
> >     
> >     Also, this check is probably not needed.  upgradeZookeeper() should be called before upgradeMetadata().  Could add a sanity check for this.
> >

I *think* that it's fine, because I think the client service doesn't start until state goes to NORMAL. So while Fate has been started, nothing yet has access to use it. I could clarify things by not starting Fate until the upgradeMetadata happens (or we determine it isn't needed).

It'd probably be simpler to add a sanity check for making sure upgradeZooKeeper happened first though.


- Sean


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review38972
-----------------------------------------------------------


On March 28, 2014, 9:22 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 9:22 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * triggered compactions
> * shutdown cluster
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> 
> Running verify job on existing data now. should take ~6 hours. 
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by ke...@deenlo.com.

> On March 29, 2014, 12:26 a.m., kturner wrote:
> > server/src/main/java/org/apache/accumulo/server/master/Master.java, line 313
> > <https://reviews.apache.org/r/19804/diff/1/?file=539926#file539926line313>
> >
> >     I think this check can cause problems. Master.run()  starts StatusThread, StatusThread.run() will indirectly call upgradeMetadata().  After Master.run() starts StatusThread, it seems like it will start Fate and the client service.  So its possible that a 1.5 client could submit a fate op before the upgradeMetadata() is called. 
> >     
> >     Also, this check is probably not needed.  upgradeZookeeper() should be called before upgradeMetadata().  Could add a sanity check for this.
> >
> 
> Sean Busbey wrote:
>     I *think* that it's fine, because I think the client service doesn't start until state goes to NORMAL. So while Fate has been started, nothing yet has access to use it. I could clarify things by not starting Fate until the upgradeMetadata happens (or we determine it isn't needed).
>     
>     It'd probably be simpler to add a sanity check for making sure upgradeZooKeeper happened first though.

Whats preventing the client service from starting until the state is NORMAL?  Looking at the code Master.run() starts it after starting the StatusThread.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review38972
-----------------------------------------------------------


On March 28, 2014, 9:22 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 9:22 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * triggered compactions
> * shutdown cluster
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> 
> Running verify job on existing data now. should take ~6 hours. 
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.

> On March 29, 2014, 12:26 a.m., kturner wrote:
> > server/src/main/java/org/apache/accumulo/server/master/Master.java, line 313
> > <https://reviews.apache.org/r/19804/diff/1/?file=539926#file539926line313>
> >
> >     I think this check can cause problems. Master.run()  starts StatusThread, StatusThread.run() will indirectly call upgradeMetadata().  After Master.run() starts StatusThread, it seems like it will start Fate and the client service.  So its possible that a 1.5 client could submit a fate op before the upgradeMetadata() is called. 
> >     
> >     Also, this check is probably not needed.  upgradeZookeeper() should be called before upgradeMetadata().  Could add a sanity check for this.
> >
> 
> Sean Busbey wrote:
>     I *think* that it's fine, because I think the client service doesn't start until state goes to NORMAL. So while Fate has been started, nothing yet has access to use it. I could clarify things by not starting Fate until the upgradeMetadata happens (or we determine it isn't needed).
>     
>     It'd probably be simpler to add a sanity check for making sure upgradeZooKeeper happened first though.
> 
> kturner wrote:
>     Whats preventing the client service from starting until the state is NORMAL?  Looking at the code Master.run() starts it after starting the StatusThread.

you're correct. I'm testing a new version now that holds fate until after the upgrade is complete.


- Sean


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review38972
-----------------------------------------------------------


On March 28, 2014, 9:22 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 9:22 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * triggered compactions
> * shutdown cluster
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> 
> Running verify job on existing data now. should take ~6 hours. 
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.

> On March 29, 2014, 12:26 a.m., kturner wrote:
> > server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java, line 3328
> > <https://reviews.apache.org/r/19804/diff/1/?file=539927#file539927line3328>
> >
> >     Seems like there is a possibility of deadlock here.
> >     
> >      1. Master gets past upgradeZookeeper()
> >      2. Client submits FATE op
> >      3. Tablet server aborts copying walogs up
> >      4. Master can not upgradeMetadata because log recovery is needed, stuck.
> >     
> >     This is assuming that what I said in prev comment about Fate starting after upgrade zookeeper is right.  Need to confirm this.
> >     
> >     Some possible options:
> >     
> >      * prevent fate from starting until upgrade is complete
> >      * only abort if there are FATE txs and upgradeZookeeper() has not run.  Would need to look for something that upgradeZookeeper() changes.
> >      * Don't delete walogs after copy if upgrade is not complete.  However would need to delete later then.  
> >     
> >     I'll think about this some more later.

New patch ensures Fate does not start until after upgrade is complete.


- Sean


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review38972
-----------------------------------------------------------


On April 2, 2014, 6:06 a.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 6:06 a.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review38972
-----------------------------------------------------------



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71354>

    The comment could mention that fate has not been started.
    
    Could add a sanity check to ensure fate was not started.



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71353>

    I think this check can cause problems. Master.run()  starts StatusThread, StatusThread.run() will indirectly call upgradeMetadata().  After Master.run() starts StatusThread, it seems like it will start Fate and the client service.  So its possible that a 1.5 client could submit a fate op before the upgradeMetadata() is called. 
    
    Also, this check is probably not needed.  upgradeZookeeper() should be called before upgradeMetadata().  Could add a sanity check for this.
    



server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java
<https://reviews.apache.org/r/19804/#comment71357>

    Seems like there is a possibility of deadlock here.
    
     1. Master gets past upgradeZookeeper()
     2. Client submits FATE op
     3. Tablet server aborts copying walogs up
     4. Master can not upgradeMetadata because log recovery is needed, stuck.
    
    This is assuming that what I said in prev comment about Fate starting after upgrade zookeeper is right.  Need to confirm this.
    
    Some possible options:
    
     * prevent fate from starting until upgrade is complete
     * only abort if there are FATE txs and upgradeZookeeper() has not run.  Would need to look for something that upgradeZookeeper() changes.
     * Don't delete walogs after copy if upgrade is not complete.  However would need to delete later then.  
    
    I'll think about this some more later.


- kturner


On March 28, 2014, 9:22 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 9:22 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * triggered compactions
> * shutdown cluster
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> 
> Running verify job on existing data now. should take ~6 hours. 
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.

> On April 2, 2014, 7:07 a.m., Mike Drob wrote:
> > server/src/main/java/org/apache/accumulo/server/master/Master.java, line 321
> > <https://reviews.apache.org/r/19804/diff/2/?file=544967#file544967line321>
> >
> >     If there is risk that multiple threads will enter this block, then the second thread will trigger the countdown latch in the else. If there is not (which I don't think there is, because this only gets called from inside of a synchronized method) then why do we perform this check?

It gets called from the Status thread, and we might transition into the state multiple times. the atomic boolean ensures we only ever enter the block once, regardless of how we go there.

That means the countdown latch is only triggered on the else of the "is an upgrade needed" not of the atomic boolean.


> On April 2, 2014, 7:07 a.m., Mike Drob wrote:
> > server/src/main/java/org/apache/accumulo/server/master/Master.java, line 352
> > <https://reviews.apache.org/r/19804/diff/2/?file=544967#file544967line352>
> >
> >     Instead of adding a countdown latch, could we have not just waited for this thread to complete? That seems more straightforward.

No, because the thread never exists if we don't need to do an upgrade.


- Sean


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39241
-----------------------------------------------------------


On April 2, 2014, 6:06 a.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 6:06 a.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Mike Drob <md...@mdrob.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39241
-----------------------------------------------------------



server/src/main/java/org/apache/accumulo/server/Accumulo.java
<https://reviews.apache.org/r/19804/#comment71600>

    Make a note here, or possibly elsewhere, that completed operations in the "SUCCESS" status will still cause an upgrade to fail.



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71603>

    If there is risk that multiple threads will enter this block, then the second thread will trigger the countdown latch in the else. If there is not (which I don't think there is, because this only gets called from inside of a synchronized method) then why do we perform this check?



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71602>

    Instead of adding a countdown latch, could we have not just waited for this thread to complete? That seems more straightforward.


- Mike Drob


On April 2, 2014, 6:06 a.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 6:06 a.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.

> On April 3, 2014, 8:08 p.m., kturner wrote:
> > server/src/main/java/org/apache/accumulo/server/master/Master.java, line 276
> > <https://reviews.apache.org/r/19804/diff/4/?file=545884#file545884line276>
> >
> >     this should be volatile because upgradeZookeeper() and upgradeMetadata() will be run by separate threads.

in reading the code, I thought upgradeZooKeeper had to happen prior to the thread that calls upgradeMetadata being created.  Am I reading the code wrong?

Master.run() calls getMasterLock (which is synchronous) and then several lines later creates the status thread and starts it.


- Sean


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39468
-----------------------------------------------------------


On April 2, 2014, 3:10 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 3:10 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by ke...@deenlo.com.

> On April 3, 2014, 8:08 p.m., kturner wrote:
> > server/src/main/java/org/apache/accumulo/server/master/Master.java, line 276
> > <https://reviews.apache.org/r/19804/diff/4/?file=545884#file545884line276>
> >
> >     this should be volatile because upgradeZookeeper() and upgradeMetadata() will be run by separate threads.
> 
> Sean Busbey wrote:
>     in reading the code, I thought upgradeZooKeeper had to happen prior to the thread that calls upgradeMetadata being created.  Am I reading the code wrong?
>     
>     Master.run() calls getMasterLock (which is synchronous) and then several lines later creates the status thread and starts it.

OK.  I did not look at the calling methods.  Actually, setMasterState() is whats synchronized, and that method calls both upgradeZooKeeper() and upgradeMetadata().  So different threads will see any changes other threads make to the boolean.  So it does not need to be volatile.

I also think upgradeZooKeeper will be called before upgradeMetadata.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39468
-----------------------------------------------------------


On April 2, 2014, 3:10 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 3:10 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39468
-----------------------------------------------------------



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71859>

    this should be volatile because upgradeZookeeper() and upgradeMetadata() will be run by separate threads.


- kturner


On April 2, 2014, 3:10 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 3:10 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Josh Elser <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39607
-----------------------------------------------------------

Ship it!


Ship It!

- Josh Elser


On April 4, 2014, 10:28 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 4, 2014, 10:28 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> After merging forward to 1.6.0-SNAPSHOT branch:
> 
> Took 1.5.2-SNAP cluster from above test
> 
> * loaded additional test data in same variety of table configs
> * queue compactions from shell
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (compactions)
> * verified WALs in HDFS
> * verified WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 5
> * Start upgrade to 1.6.0-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs
> * verified same waiting Fate transactions
> * verified same waiting hdfs WALs
> * verified /accumulo/version showed 5
> * Cleared Fate operations
> * start upgrade to 1.6.0-SNAP
> * Wait a good deal of time, though not as long as last time (largely for recover of ~7GB of WALs)
> * verify no errors shown for upgrade
> * verified /accumulo/version showed 6
> * verify restart cluster post-upgrade doesn't upgrade
> * verified monitor showed normal start up
> * waited for all tablets to be hosted
> * verify test data (both 1.4 written and 1.5 written)
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/
-----------------------------------------------------------

(Updated April 4, 2014, 10:28 p.m.)


Review request for accumulo and kturner.


Bugs: ACCUMULO-2519
    https://issues.apache.org/jira/browse/ACCUMULO-2519


Repository: accumulo


Description
-------

Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.


Diffs
-----

  README 115a9b7 
  server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
  server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
  server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
  server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 

Diff: https://reviews.apache.org/r/19804/diff/


Testing (updated)
-------

Took a 1.4.5-SNAP cluster

* loaded test data in a variety of table configs
* alternate table creation and deletion
* load additional table to cause !METADATA churn
* shutdown cluster uncleanly
* verified waiting Fate transactions (table deletion at success status)
* verified waiting local WALs
* verified waiting local WALs include !METADATA table (via LogReader)
* verified /accumulo/version showed 4
* Start upgrade to 1.5.2-SNAP
* verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
* verified same waiting Fate transactions
* verified same waiting local WALs
* verified /accumulo/version showed 4
* Cleared Fate operations
* Start upgrade to 1.5.2-SNAP
* wait a terrifying long amount of time, check on progress via local logs
* verify no errors shown for upgrade
* verified WALs copied to HDFS
* verified /accumulo/version showed 5
* verified monitor showed normal start up
* wait for all tablets to be hosted
* verify test data

After merging forward to 1.6.0-SNAPSHOT branch:

Took 1.5.2-SNAP cluster from above test

* loaded additional test data in same variety of table configs
* queue compactions from shell
* load additional table to cause !METADATA churn
* shutdown cluster uncleanly
* verified waiting Fate transactions (compactions)
* verified WALs in HDFS
* verified WALs include !METADATA table (via LogReader)
* verified /accumulo/version showed 5
* Start upgrade to 1.6.0-SNAP
* verified errors showing no upgrade and to go back to docs in: monitor, master logs
* verified same waiting Fate transactions
* verified same waiting hdfs WALs
* verified /accumulo/version showed 5
* Cleared Fate operations
* start upgrade to 1.6.0-SNAP
* Wait a good deal of time, though not as long as last time (largely for recover of ~7GB of WALs)
* verify no errors shown for upgrade
* verified /accumulo/version showed 6
* verify restart cluster post-upgrade doesn't upgrade
* verified monitor showed normal start up
* waited for all tablets to be hosted
* verify test data (both 1.4 written and 1.5 written)


Thanks,

Sean Busbey


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39476
-----------------------------------------------------------

Ship it!


Ship It!

- kturner


On April 2, 2014, 3:10 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 3:10 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/
-----------------------------------------------------------

(Updated April 2, 2014, 3:10 p.m.)


Review request for accumulo and kturner.


Changes
-------

updated sanity checks and docs per Bill H's feedback.


Bugs: ACCUMULO-2519
    https://issues.apache.org/jira/browse/ACCUMULO-2519


Repository: accumulo


Description
-------

Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.


Diffs (updated)
-----

  README 115a9b7 
  server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
  server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
  server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
  server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 

Diff: https://reviews.apache.org/r/19804/diff/


Testing
-------

Took a 1.4.5-SNAP cluster

* loaded test data in a variety of table configs
* alternate table creation and deletion
* load additional table to cause !METADATA churn
* shutdown cluster uncleanly
* verified waiting Fate transactions (table deletion at success status)
* verified waiting local WALs
* verified waiting local WALs include !METADATA table (via LogReader)
* verified /accumulo/version showed 4
* Start upgrade to 1.5.2-SNAP
* verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
* verified same waiting Fate transactions
* verified same waiting local WALs
* verified /accumulo/version showed 4
* Cleared Fate operations
* Start upgrade to 1.5.2-SNAP
* wait a terrifying long amount of time, check on progress via local logs
* verify no errors shown for upgrade
* verified WALs copied to HDFS
* verified /accumulo/version showed 5
* verified monitor showed normal start up
* wait for all tablets to be hosted
* verify test data


Thanks,

Sean Busbey


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.

> On April 2, 2014, 2 p.m., Bill Havanki wrote:
> > README, line 61
> > <https://reviews.apache.org/r/19804/diff/3/?file=545050#file545050line61>
> >
> >     nit: "to delete"

fixed


> On April 2, 2014, 2 p.m., Bill Havanki wrote:
> > server/src/main/java/org/apache/accumulo/server/master/Master.java, line 288
> > <https://reviews.apache.org/r/19804/diff/3/?file=545052#file545052line288>
> >
> >     IllegalStateException would be even better to throw here (and other spots later on).

good idea!


- Sean


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39251
-----------------------------------------------------------


On April 2, 2014, 7:50 a.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 7:50 a.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Bill Havanki <bh...@clouderagovt.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review39251
-----------------------------------------------------------



README
<https://reviews.apache.org/r/19804/#comment71613>

    nit: "to delete"



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71614>

    IllegalStateException would be even better to throw here (and other spots later on).


- Bill Havanki


On April 2, 2014, 3:50 a.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 3:50 a.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
>   server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * loaded test data in a variety of table configs
> * alternate table creation and deletion
> * load additional table to cause !METADATA churn
> * shutdown cluster uncleanly
> * verified waiting Fate transactions (table deletion at success status)
> * verified waiting local WALs
> * verified waiting local WALs include !METADATA table (via LogReader)
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
> * verified same waiting Fate transactions
> * verified same waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * wait a terrifying long amount of time, check on progress via local logs
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> * wait for all tablets to be hosted
> * verify test data
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/
-----------------------------------------------------------

(Updated April 2, 2014, 7:50 a.m.)


Review request for accumulo and kturner.


Changes
-------

updated docs to call out that completed fate operations will block and upgrade and can be deleted.


Bugs: ACCUMULO-2519
    https://issues.apache.org/jira/browse/ACCUMULO-2519


Repository: accumulo


Description
-------

Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.


Diffs (updated)
-----

  README 115a9b7 
  server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
  server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
  server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
  server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 

Diff: https://reviews.apache.org/r/19804/diff/


Testing
-------

Took a 1.4.5-SNAP cluster

* loaded test data in a variety of table configs
* alternate table creation and deletion
* load additional table to cause !METADATA churn
* shutdown cluster uncleanly
* verified waiting Fate transactions (table deletion at success status)
* verified waiting local WALs
* verified waiting local WALs include !METADATA table (via LogReader)
* verified /accumulo/version showed 4
* Start upgrade to 1.5.2-SNAP
* verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
* verified same waiting Fate transactions
* verified same waiting local WALs
* verified /accumulo/version showed 4
* Cleared Fate operations
* Start upgrade to 1.5.2-SNAP
* wait a terrifying long amount of time, check on progress via local logs
* verify no errors shown for upgrade
* verified WALs copied to HDFS
* verified /accumulo/version showed 5
* verified monitor showed normal start up
* wait for all tablets to be hosted
* verify test data


Thanks,

Sean Busbey


Re: Review Request 19804: ACCUMULO-2519 Aborts upgrade if there are Fate transactions from an old version.

Posted by Sean Busbey <se...@manvsbeard.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/
-----------------------------------------------------------

(Updated April 2, 2014, 6:06 a.m.)


Review request for accumulo and kturner.


Changes
-------

Updated implementation to make sure that Fate isn't started until after we finish upgrading.


Bugs: ACCUMULO-2519
    https://issues.apache.org/jira/browse/ACCUMULO-2519


Repository: accumulo


Description
-------

Adds "make sure Fate has no outstanding items" to the upgrade instructions. Makes sure the master and tabletservers don't take upgrade steps if they see fate ops waiting.


Diffs (updated)
-----

  README 115a9b7 
  server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
  server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
  server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java d76946d 
  server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java 7328a55 

Diff: https://reviews.apache.org/r/19804/diff/


Testing (updated)
-------

Took a 1.4.5-SNAP cluster

* loaded test data in a variety of table configs
* alternate table creation and deletion
* load additional table to cause !METADATA churn
* shutdown cluster uncleanly
* verified waiting Fate transactions (table deletion at success status)
* verified waiting local WALs
* verified waiting local WALs include !METADATA table (via LogReader)
* verified /accumulo/version showed 4
* Start upgrade to 1.5.2-SNAP
* verified errors showing no upgrade and to go back to docs in: monitor, master logs, tabletserver logs
* verified same waiting Fate transactions
* verified same waiting local WALs
* verified /accumulo/version showed 4
* Cleared Fate operations
* Start upgrade to 1.5.2-SNAP
* wait a terrifying long amount of time, check on progress via local logs
* verify no errors shown for upgrade
* verified WALs copied to HDFS
* verified /accumulo/version showed 5
* verified monitor showed normal start up
* wait for all tablets to be hosted
* verify test data


Thanks,

Sean Busbey