You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by ke...@deenlo.com on 2014/08/19 19:50:14 UTC

Review Request 24855: ACCUMULO-1454 design doc

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/
-----------------------------------------------------------

Review request for accumulo.


Bugs: ACCUMULO-1454
    https://issues.apache.org/jira/browse/ACCUMULO-1454


Repository: accumulo


Description
-------

Positing ACCUMULO-1454 design doc for review


Diffs
-----

  docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 

Diff: https://reviews.apache.org/r/24855/diff/


Testing
-------


Thanks,

kturner


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.

> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This goes back through the balancer, so you have a bit of churn in however many "rounds" the Balancer takes to choose where those tablets should go, and then for the master to process the necessary assignments for each tserver. How I'm seeing it described is that the only piece of the puzzle that we're making better is removing the migration components in favor of letting the user control this directly. How much does a "smart" Balancer implementation close the gap between the user providing migrations in regards to performance? Also, how does removing the Balancer from the equation change the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into some simple primitives, I believe this approach is still a rather difficult distributed state problem that I'm worried is being over-architected. My $0.02.
> 
> Josh Elser wrote:
>     For context, I was reading about HBase's support on the subject and found http://hbase.apache.org/book/node.management.html. Their general approach is to provide a graceful shutdown for regionservers. This is still subject to problems in mass amounts of servers being stopped at one time. To alleviate some of this pain, they use ZK to store what servers are currently in a "draining state" to avoid new assignments to those nodes -- "[...] decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining. Marking RegionServers to be in the draining state prevents this from happening",
> 
> kturner wrote:
>     An alternative to this design, is one that Mike mentioned on the issue.   Temporarily replace the balancer.  I am thinking that providing these primitves for manipulating tablets will allow an administrator to quickly script a one off solution to a problem, in addition to solving the rolling restart problem.  You do not get this quick flexibility with writing a new balancer.
>     
>     Killing tablet servers is a solution.  I think it would be nice to have a solution that avoids log recovery, minimizes down time of individual tablets, preserves locality, and is easy to use.  It does not have to be this solution.  W/o additional scripts, the primary use case in 1454 would not be easy to use.   A balancer alone would not be enough to achieve the goal of migrating tablets between old and new tservers on the same node.  However a balancer + tservers states like you mentioned from HBAse may provide enough.  Should probably try to explore the balancer option a bit more.
> 
> kturner wrote:
>     One other thing I was thinking about was that you can not make assumptions about the environment.  Users may not use the Accumulo scripts to start and stop tservers.

I think there would be merit in enumerating what would be needed by a custom Balancer. Is it really something that would need to be written on a per-instance basis, or is there something we could provide that would be more conducive to "heavy" tserver churn.

I would definitely not advocate killing tservers. A graceful shutdown would be much more desirable. We get a little bit of help here by the client-side scan retries for not having to quiesce all reads to a tablet, but that could still introduce more latency for a query (e.g. lots of filtering over a large row).

As mentioned about concerns with the final two-tservers-per-node approach, I'm not entirely convinced that "sibling" tservers is worth the complexity. We really don't have that much locality in how we use HDFS now. Is trying to keep all of the tablets assigned on the same node going to make things much more efficient over assigning them to nodes elsewhere? I don't even have a good grasp for what these perf numbers would be at a high level.


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.

> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This goes back through the balancer, so you have a bit of churn in however many "rounds" the Balancer takes to choose where those tablets should go, and then for the master to process the necessary assignments for each tserver. How I'm seeing it described is that the only piece of the puzzle that we're making better is removing the migration components in favor of letting the user control this directly. How much does a "smart" Balancer implementation close the gap between the user providing migrations in regards to performance? Also, how does removing the Balancer from the equation change the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into some simple primitives, I believe this approach is still a rather difficult distributed state problem that I'm worried is being over-architected. My $0.02.
> 
> Josh Elser wrote:
>     For context, I was reading about HBase's support on the subject and found http://hbase.apache.org/book/node.management.html. Their general approach is to provide a graceful shutdown for regionservers. This is still subject to problems in mass amounts of servers being stopped at one time. To alleviate some of this pain, they use ZK to store what servers are currently in a "draining state" to avoid new assignments to those nodes -- "[...] decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining. Marking RegionServers to be in the draining state prevents this from happening",
> 
> kturner wrote:
>     An alternative to this design, is one that Mike mentioned on the issue.   Temporarily replace the balancer.  I am thinking that providing these primitves for manipulating tablets will allow an administrator to quickly script a one off solution to a problem, in addition to solving the rolling restart problem.  You do not get this quick flexibility with writing a new balancer.
>     
>     Killing tablet servers is a solution.  I think it would be nice to have a solution that avoids log recovery, minimizes down time of individual tablets, preserves locality, and is easy to use.  It does not have to be this solution.  W/o additional scripts, the primary use case in 1454 would not be easy to use.   A balancer alone would not be enough to achieve the goal of migrating tablets between old and new tservers on the same node.  However a balancer + tservers states like you mentioned from HBAse may provide enough.  Should probably try to explore the balancer option a bit more.
> 
> kturner wrote:
>     One other thing I was thinking about was that you can not make assumptions about the environment.  Users may not use the Accumulo scripts to start and stop tservers.
> 
> Josh Elser wrote:
>     I think there would be merit in enumerating what would be needed by a custom Balancer. Is it really something that would need to be written on a per-instance basis, or is there something we could provide that would be more conducive to "heavy" tserver churn.
>     
>     I would definitely not advocate killing tservers. A graceful shutdown would be much more desirable. We get a little bit of help here by the client-side scan retries for not having to quiesce all reads to a tablet, but that could still introduce more latency for a query (e.g. lots of filtering over a large row).
>     
>     As mentioned about concerns with the final two-tservers-per-node approach, I'm not entirely convinced that "sibling" tservers is worth the complexity. We really don't have that much locality in how we use HDFS now. Is trying to keep all of the tablets assigned on the same node going to make things much more efficient over assigning them to nodes elsewhere? I don't even have a good grasp for what these perf numbers would be at a high level.
> 
> kturner wrote:
>     Eric looked into locality once when running continuous ingest and found that ~50% of tablets had local data.    This matches expectations as the default balancer will try to migrate one child after a split.
>     
>     The sibling tserver concept may be too complex to implement.  Sigh, but its so cool :)
> 
> Josh Elser wrote:
>     Clarification on what I meant by locality: we don't consider HDFS block locations when we chose where Tablets get assigned, AFAIK. Yes, we'll have locality when we're slamming Accumulo with ingest, but once we start agitating at any reasonable rate, that's going to be lost.
>     
>     Requiring sibling tservers also implies that you have ample extra resources on a node which is absolutely not going to be the case for most systems. It would be nice, but it sounds to me like a one-off from what would be the norm. :)

On the mailing list Adam thought locality reached 90% in long running CI test.  Need to ask Eric what he saw.  Seems plausable that as time since split increases that locality would increase in a stable system.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Adam Fuchs <af...@apache.org>.
On Tue, Aug 19, 2014 at 4:17 PM, <ke...@deenlo.com> wrote:
>
> ...
> Eric looked into locality once when running continuous ingest and found that ~50% of tablets had local data.    This matches expectations as the default balancer will try to migrate one child after a split.

I thought he found it was more like 90%+ for a long running ingest
test. Citation needed?

Adam

Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.

> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This goes back through the balancer, so you have a bit of churn in however many "rounds" the Balancer takes to choose where those tablets should go, and then for the master to process the necessary assignments for each tserver. How I'm seeing it described is that the only piece of the puzzle that we're making better is removing the migration components in favor of letting the user control this directly. How much does a "smart" Balancer implementation close the gap between the user providing migrations in regards to performance? Also, how does removing the Balancer from the equation change the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into some simple primitives, I believe this approach is still a rather difficult distributed state problem that I'm worried is being over-architected. My $0.02.
> 
> Josh Elser wrote:
>     For context, I was reading about HBase's support on the subject and found http://hbase.apache.org/book/node.management.html. Their general approach is to provide a graceful shutdown for regionservers. This is still subject to problems in mass amounts of servers being stopped at one time. To alleviate some of this pain, they use ZK to store what servers are currently in a "draining state" to avoid new assignments to those nodes -- "[...] decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining. Marking RegionServers to be in the draining state prevents this from happening",
> 
> kturner wrote:
>     An alternative to this design, is one that Mike mentioned on the issue.   Temporarily replace the balancer.  I am thinking that providing these primitves for manipulating tablets will allow an administrator to quickly script a one off solution to a problem, in addition to solving the rolling restart problem.  You do not get this quick flexibility with writing a new balancer.
>     
>     Killing tablet servers is a solution.  I think it would be nice to have a solution that avoids log recovery, minimizes down time of individual tablets, preserves locality, and is easy to use.  It does not have to be this solution.  W/o additional scripts, the primary use case in 1454 would not be easy to use.   A balancer alone would not be enough to achieve the goal of migrating tablets between old and new tservers on the same node.  However a balancer + tservers states like you mentioned from HBAse may provide enough.  Should probably try to explore the balancer option a bit more.
> 
> kturner wrote:
>     One other thing I was thinking about was that you can not make assumptions about the environment.  Users may not use the Accumulo scripts to start and stop tservers.
> 
> Josh Elser wrote:
>     I think there would be merit in enumerating what would be needed by a custom Balancer. Is it really something that would need to be written on a per-instance basis, or is there something we could provide that would be more conducive to "heavy" tserver churn.
>     
>     I would definitely not advocate killing tservers. A graceful shutdown would be much more desirable. We get a little bit of help here by the client-side scan retries for not having to quiesce all reads to a tablet, but that could still introduce more latency for a query (e.g. lots of filtering over a large row).
>     
>     As mentioned about concerns with the final two-tservers-per-node approach, I'm not entirely convinced that "sibling" tservers is worth the complexity. We really don't have that much locality in how we use HDFS now. Is trying to keep all of the tablets assigned on the same node going to make things much more efficient over assigning them to nodes elsewhere? I don't even have a good grasp for what these perf numbers would be at a high level.

Eric looked into locality once when running continuous ingest and found that ~50% of tablets had local data.    This matches expectations as the default balancer will try to migrate one child after a split.

The sibling tserver concept may be too complex to implement.  Sigh, but its so cool :)


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.

> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This goes back through the balancer, so you have a bit of churn in however many "rounds" the Balancer takes to choose where those tablets should go, and then for the master to process the necessary assignments for each tserver. How I'm seeing it described is that the only piece of the puzzle that we're making better is removing the migration components in favor of letting the user control this directly. How much does a "smart" Balancer implementation close the gap between the user providing migrations in regards to performance? Also, how does removing the Balancer from the equation change the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into some simple primitives, I believe this approach is still a rather difficult distributed state problem that I'm worried is being over-architected. My $0.02.
> 
> Josh Elser wrote:
>     For context, I was reading about HBase's support on the subject and found http://hbase.apache.org/book/node.management.html. Their general approach is to provide a graceful shutdown for regionservers. This is still subject to problems in mass amounts of servers being stopped at one time. To alleviate some of this pain, they use ZK to store what servers are currently in a "draining state" to avoid new assignments to those nodes -- "[...] decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining. Marking RegionServers to be in the draining state prevents this from happening",
> 
> kturner wrote:
>     An alternative to this design, is one that Mike mentioned on the issue.   Temporarily replace the balancer.  I am thinking that providing these primitves for manipulating tablets will allow an administrator to quickly script a one off solution to a problem, in addition to solving the rolling restart problem.  You do not get this quick flexibility with writing a new balancer.
>     
>     Killing tablet servers is a solution.  I think it would be nice to have a solution that avoids log recovery, minimizes down time of individual tablets, preserves locality, and is easy to use.  It does not have to be this solution.  W/o additional scripts, the primary use case in 1454 would not be easy to use.   A balancer alone would not be enough to achieve the goal of migrating tablets between old and new tservers on the same node.  However a balancer + tservers states like you mentioned from HBAse may provide enough.  Should probably try to explore the balancer option a bit more.

One other thing I was thinking about was that you can not make assumptions about the environment.  Users may not use the Accumulo scripts to start and stop tservers.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.

> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This goes back through the balancer, so you have a bit of churn in however many "rounds" the Balancer takes to choose where those tablets should go, and then for the master to process the necessary assignments for each tserver. How I'm seeing it described is that the only piece of the puzzle that we're making better is removing the migration components in favor of letting the user control this directly. How much does a "smart" Balancer implementation close the gap between the user providing migrations in regards to performance? Also, how does removing the Balancer from the equation change the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into some simple primitives, I believe this approach is still a rather difficult distributed state problem that I'm worried is being over-architected. My $0.02.
> 
> Josh Elser wrote:
>     For context, I was reading about HBase's support on the subject and found http://hbase.apache.org/book/node.management.html. Their general approach is to provide a graceful shutdown for regionservers. This is still subject to problems in mass amounts of servers being stopped at one time. To alleviate some of this pain, they use ZK to store what servers are currently in a "draining state" to avoid new assignments to those nodes -- "[...] decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining. Marking RegionServers to be in the draining state prevents this from happening",

An alternative to this design, is one that Mike mentioned on the issue.   Temporarily replace the balancer.  I am thinking that providing these primitves for manipulating tablets will allow an administrator to quickly script a one off solution to a problem, in addition to solving the rolling restart problem.  You do not get this quick flexibility with writing a new balancer.

Killing tablet servers is a solution.  I think it would be nice to have a solution that avoids log recovery, minimizes down time of individual tablets, preserves locality, and is easy to use.  It does not have to be this solution.  W/o additional scripts, the primary use case in 1454 would not be easy to use.   A balancer alone would not be enough to achieve the goal of migrating tablets between old and new tservers on the same node.  However a balancer + tservers states like you mentioned from HBAse may provide enough.  Should probably try to explore the balancer option a bit more.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.

> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This goes back through the balancer, so you have a bit of churn in however many "rounds" the Balancer takes to choose where those tablets should go, and then for the master to process the necessary assignments for each tserver. How I'm seeing it described is that the only piece of the puzzle that we're making better is removing the migration components in favor of letting the user control this directly. How much does a "smart" Balancer implementation close the gap between the user providing migrations in regards to performance? Also, how does removing the Balancer from the equation change the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into some simple primitives, I believe this approach is still a rather difficult distributed state problem that I'm worried is being over-architected. My $0.02.

For context, I was reading about HBase's support on the subject and found http://hbase.apache.org/book/node.management.html. Their general approach is to provide a graceful shutdown for regionservers. This is still subject to problems in mass amounts of servers being stopped at one time. To alleviate some of this pain, they use ZK to store what servers are currently in a "draining state" to avoid new assignments to those nodes -- "[...] decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining. Marking RegionServers to be in the draining state prevents this from happening",


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.

> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This goes back through the balancer, so you have a bit of churn in however many "rounds" the Balancer takes to choose where those tablets should go, and then for the master to process the necessary assignments for each tserver. How I'm seeing it described is that the only piece of the puzzle that we're making better is removing the migration components in favor of letting the user control this directly. How much does a "smart" Balancer implementation close the gap between the user providing migrations in regards to performance? Also, how does removing the Balancer from the equation change the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into some simple primitives, I believe this approach is still a rather difficult distributed state problem that I'm worried is being over-architected. My $0.02.
> 
> Josh Elser wrote:
>     For context, I was reading about HBase's support on the subject and found http://hbase.apache.org/book/node.management.html. Their general approach is to provide a graceful shutdown for regionservers. This is still subject to problems in mass amounts of servers being stopped at one time. To alleviate some of this pain, they use ZK to store what servers are currently in a "draining state" to avoid new assignments to those nodes -- "[...] decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining. Marking RegionServers to be in the draining state prevents this from happening",
> 
> kturner wrote:
>     An alternative to this design, is one that Mike mentioned on the issue.   Temporarily replace the balancer.  I am thinking that providing these primitves for manipulating tablets will allow an administrator to quickly script a one off solution to a problem, in addition to solving the rolling restart problem.  You do not get this quick flexibility with writing a new balancer.
>     
>     Killing tablet servers is a solution.  I think it would be nice to have a solution that avoids log recovery, minimizes down time of individual tablets, preserves locality, and is easy to use.  It does not have to be this solution.  W/o additional scripts, the primary use case in 1454 would not be easy to use.   A balancer alone would not be enough to achieve the goal of migrating tablets between old and new tservers on the same node.  However a balancer + tservers states like you mentioned from HBAse may provide enough.  Should probably try to explore the balancer option a bit more.
> 
> kturner wrote:
>     One other thing I was thinking about was that you can not make assumptions about the environment.  Users may not use the Accumulo scripts to start and stop tservers.
> 
> Josh Elser wrote:
>     I think there would be merit in enumerating what would be needed by a custom Balancer. Is it really something that would need to be written on a per-instance basis, or is there something we could provide that would be more conducive to "heavy" tserver churn.
>     
>     I would definitely not advocate killing tservers. A graceful shutdown would be much more desirable. We get a little bit of help here by the client-side scan retries for not having to quiesce all reads to a tablet, but that could still introduce more latency for a query (e.g. lots of filtering over a large row).
>     
>     As mentioned about concerns with the final two-tservers-per-node approach, I'm not entirely convinced that "sibling" tservers is worth the complexity. We really don't have that much locality in how we use HDFS now. Is trying to keep all of the tablets assigned on the same node going to make things much more efficient over assigning them to nodes elsewhere? I don't even have a good grasp for what these perf numbers would be at a high level.
> 
> kturner wrote:
>     Eric looked into locality once when running continuous ingest and found that ~50% of tablets had local data.    This matches expectations as the default balancer will try to migrate one child after a split.
>     
>     The sibling tserver concept may be too complex to implement.  Sigh, but its so cool :)

Clarification on what I meant by locality: we don't consider HDFS block locations when we chose where Tablets get assigned, AFAIK. Yes, we'll have locality when we're slamming Accumulo with ingest, but once we start agitating at any reasonable rate, that's going to be lost.

Requiring sibling tservers also implies that you have ample extra resources on a node which is absolutely not going to be the case for most systems. It would be nice, but it sounds to me like a one-off from what would be the norm. :)


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


One big design concern I have is what gains the final solution would actually have over what is currently possible with Accumulo as it stands.

Right now, you can force tablets to migrate by stopping a tserver. This goes back through the balancer, so you have a bit of churn in however many "rounds" the Balancer takes to choose where those tablets should go, and then for the master to process the necessary assignments for each tserver. How I'm seeing it described is that the only piece of the puzzle that we're making better is removing the migration components in favor of letting the user control this directly. How much does a "smart" Balancer implementation close the gap between the user providing migrations in regards to performance? Also, how does removing the Balancer from the equation change the wall time to get a tablet assigned (is it significant)?

We have to also understand that while we can decompose the problem into some simple primitives, I believe this approach is still a rather difficult distributed state problem that I'm worried is being over-architected. My $0.02.

- Josh Elser


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review74840
-----------------------------------------------------------


Was thinking about this some more today again.


docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment121624>

    Disabling tablet assignment across the cluster could have unintended negative consequences. If a tabletserver dies during a rolling upgrade, the tablets its hosting would be unavailable until that server is restarted as a part of the rolling upgrade script. For large numbers of tservers, that could be an extended outage.
    
    It would be better if we could identify some batch of tabletservers, mark all tablets currently hosted on those tablet servers as "disabled", and prevent any migrations to those servers. This would allow the rest of the cluster to continue to operate as normal, while avoiding reassignment churn on the nodes being restarted.
    
    It would be more difficult to implement in the master than simply disabling assingment completely. We might be able to do it fairly easily with a new value in TabletGoalState, but that's only after a very naive look at the code recently.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment121623>

    I have no idea how it was done, but I found myself lamenting that we couldn't somehow let the master restart a tabletserver instead of just shutting it down.
    
    That would alleviate the shell-scripting burden, but I can't think of a way to actually make that happen. I'm going to look at what HBase has for their scripting of RU.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment121622>

    Maybe it would be better to create an Enum for "assignment state". The first two values in this enum would be "DISABLE", "ENABLE". This would give us some more flexibility in supporting additional states in the future, although I can't directly come up with a concrete example.


- Josh Elser


On Aug. 21, 2014, 8:12 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2014, 8:12 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51400
-----------------------------------------------------------


I was thinking about this over the weekend.  What happens w/ the metadata table is not considered when assignment is disabled.  Also the design doc only considers restarting one node at a time.  Would probably want to do a few nodes at a time on a larger cluster.  I am going to give these issues some more thought and update the doc.

- kturner


On Aug. 21, 2014, 8:12 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2014, 8:12 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by mailisto14ken mailisto14ken <mi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review226392
-----------------------------------------------------------


Ship it!




Ship It!

- mailisto14ken mailisto14ken


On Aug. 21, 2014, 8:12 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2014, 8:12 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/24855/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51310
-----------------------------------------------------------

Ship it!


Minor clarification. Otherwise, LGTM.


docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89490>

    For completeness, it's more than just `instance.secret`; changing any property that starts with `instance.` will cause breakage.


- Josh Elser


On Aug. 21, 2014, 8:12 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2014, 8:12 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Eric Newton <er...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51420
-----------------------------------------------------------



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89737>

    nit: spelling


- Eric Newton


On Aug. 21, 2014, 8:12 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2014, 8:12 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/
-----------------------------------------------------------

(Updated Aug. 21, 2014, 8:12 p.m.)


Review request for accumulo.


Changes
-------

updates based on Josh's and Christopher's comments


Bugs: ACCUMULO-1454
    https://issues.apache.org/jira/browse/ACCUMULO-1454


Repository: accumulo


Description
-------

Positing ACCUMULO-1454 design doc for review


Diffs (updated)
-----

  docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 

Diff: https://reviews.apache.org/r/24855/diff/


Testing
-------


Thanks,

kturner


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.

> On Aug. 21, 2014, 6:55 p.m., Christopher Tubbs wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, lines 79-87
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line79>
> >
> >     Would these be idempotent, or would there be an error thrown if not in the expected state? I'd assume idempotent, but it's not clear here.

I was thinking it would continue w/o raising exception if assignments were already enabled.


> On Aug. 21, 2014, 6:55 p.m., Christopher Tubbs wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 100
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line100>
> >
> >     We might want to seriously think about how we identify tservers in the API. String type is probably a bad idea. HostAndPort, or some unique assigned tserverID is probably better.
> >     
> >     This method could also take a set.

I am going to go w/ String for now, because the existing getTabletServers() and ping() methods use String.  I think a set is a good idea.


> On Aug. 21, 2014, 6:55 p.m., Christopher Tubbs wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 105
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line105>
> >
> >     Would this fail immediately if assignments are disabled? Or hang indefinitely (maybe with the occasional warning message)?

I think it should fail.  I will add something to the doc.


> On Aug. 21, 2014, 6:55 p.m., Christopher Tubbs wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, lines 23-24
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line23>
> >
> >     I wouldn't want to depend on these specific scripts. I'd expect any actual implementation would be script-independent and that any restart mechanism for the tserver would be sufficient.

I will add something to the testing section.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51201
-----------------------------------------------------------


On Aug. 20, 2014, 5:40 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2014, 5:40 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Christopher Tubbs <ct...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51201
-----------------------------------------------------------



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89232>

    Clarification suggestion:
    
    s/Disable tablet assignment/Disable all tablet across the cluster/



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89234>

    I wouldn't want to depend on these specific scripts. I'd expect any actual implementation would be script-independent and that any restart mechanism for the tserver would be sufficient.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89240>

    We are relying on a decent balancer implementation here, which is reasonable.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89243>

    Would be nice to provide a (*very short*) convenience script in contrib to do this, given a list of tserver nodes (or that looks in ZK), just as an example of how to do this.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89245>

    Make sure that the method for initiating a clean shutdown does not depend on stop-here.sh, though stop-here.sh may use that method.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89246>

    Would these be idempotent, or would there be an error thrown if not in the expected state? I'd assume idempotent, but it's not clear here.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89247>

    We might want to seriously think about how we identify tservers in the API. String type is probably a bad idea. HostAndPort, or some unique assigned tserverID is probably better.
    
    This method could also take a set.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89248>

    Would this fail immediately if assignments are disabled? Or hang indefinitely (maybe with the occasional warning message)?


- Christopher Tubbs


On Aug. 20, 2014, 1:40 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2014, 1:40 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.

> On Aug. 21, 2014, 4:58 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 111
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line111>
> >
> >     Would decomission(String) do more/less than what `accumulo admin stop tserver` currently does?
> 
> kturner wrote:
>     Yeah

Cool. It's nice to have explicit implementations/methods for what commands on `accumulo` are running.


> On Aug. 21, 2014, 4:58 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 35
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line35>
> >
> >     stop-here.sh and start-here.sh already can't be used when running more than one tserver per host (e.g. Slider -- accumulo on yarn) because those scripts assume that there is only one process per node.
> >     
> >     This is a bigger problem in regards to the assumptions that the scripts make. I've come to the conclusion already that we need to rethink the scripts to support this.
> >     
> >     I think what you've outlined for rolling restarts still makes sense with multiple tservers per host (assuming the last loc is host:port and not just host)
> 
> kturner wrote:
>     ugh.  Sounds like that situation will be harder to test.   Like you said, I would like the design to support multiple tservers per a node even if the scripts do not.

It's a pain, but not impossible. You have to just use `accumulo` directly, manually control where log files get written (as that's done by `start-server.sh`) and make sure that the `--address` is set correctly for what you want.


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51183
-----------------------------------------------------------


On Aug. 20, 2014, 5:40 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2014, 5:40 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.

> On Aug. 21, 2014, 4:58 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 24
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line24>
> >
> >     The re-assignment of the tablets from the node that was restarted should get reassigned back to that node because of the last location in for the tablet, right?

Yeah.  Also I am thinking that even if the tablet does not have the proper last location set, that it may still go to the tserver because the tserver has fewer tablets.   Need to test this.  I am going to add a test section to the document based on some of your comments.


> On Aug. 21, 2014, 4:58 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 35
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line35>
> >
> >     stop-here.sh and start-here.sh already can't be used when running more than one tserver per host (e.g. Slider -- accumulo on yarn) because those scripts assume that there is only one process per node.
> >     
> >     This is a bigger problem in regards to the assumptions that the scripts make. I've come to the conclusion already that we need to rethink the scripts to support this.
> >     
> >     I think what you've outlined for rolling restarts still makes sense with multiple tservers per host (assuming the last loc is host:port and not just host)

ugh.  Sounds like that situation will be harder to test.   Like you said, I would like the design to support multiple tservers per a node even if the scripts do not.


> On Aug. 21, 2014, 4:58 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 44
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line44>
> >
> >     Important to note that some properties (instance.* specifically) cannot be changed and restarted sequentially as the SystemCredential will have changed.

that will need to be called in out documentation.   I can add a section about that to the document.


> On Aug. 21, 2014, 4:58 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 61
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line61>
> >
> >     Preemption is a big consideration here in regards to major compactions and scans.
> >     
> >     MajC's over very large tablets, with iterators applied, could take a significant amount of time.
> >     
> >     Scans which are performing large filtering (IntersectingIterator-like operations) could induce a bit of extra latency to the user. They shouldn't see it fail (as long as no external system kills the scan), but it will take a while.
> >     
> >     I think with majc we just want to cancel them. Do we wait for scans to finish before unloading? I can think of considerations for both waiting on them or cancelling them.

Tablet close will attempt to cancel any running compactions.   I can not remember w/ scans.  Will need to test these situations.


> On Aug. 21, 2014, 4:58 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 111
> > <https://reviews.apache.org/r/24855/diff/2/?file=665310#file665310line111>
> >
> >     Would decomission(String) do more/less than what `accumulo admin stop tserver` currently does?

Yeah


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51183
-----------------------------------------------------------


On Aug. 20, 2014, 5:40 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2014, 5:40 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51183
-----------------------------------------------------------


Thanks for the changes, Keith. I think this is much more approachable. I'm guessing that there are still some devils lying in wait, but I like the general approach.


docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89206>

    The re-assignment of the tablets from the node that was restarted should get reassigned back to that node because of the last location in for the tablet, right?



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89209>

    stop-here.sh and start-here.sh already can't be used when running more than one tserver per host (e.g. Slider -- accumulo on yarn) because those scripts assume that there is only one process per node.
    
    This is a bigger problem in regards to the assumptions that the scripts make. I've come to the conclusion already that we need to rethink the scripts to support this.
    
    I think what you've outlined for rolling restarts still makes sense with multiple tservers per host (assuming the last loc is host:port and not just host)



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89210>

    Important to note that some properties (instance.* specifically) cannot be changed and restarted sequentially as the SystemCredential will have changed.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89211>

    Preemption is a big consideration here in regards to major compactions and scans.
    
    MajC's over very large tablets, with iterators applied, could take a significant amount of time.
    
    Scans which are performing large filtering (IntersectingIterator-like operations) could induce a bit of extra latency to the user. They shouldn't see it fail (as long as no external system kills the scan), but it will take a while.
    
    I think with majc we just want to cancel them. Do we wait for scans to finish before unloading? I can think of considerations for both waiting on them or cancelling them.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment89212>

    Would decomission(String) do more/less than what `accumulo admin stop tserver` currently does?


- Josh Elser


On Aug. 20, 2014, 5:40 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2014, 5:40 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/
-----------------------------------------------------------

(Updated Aug. 20, 2014, 5:40 p.m.)


Review request for accumulo.


Changes
-------

Josh convinced me to abandon the concept of concurrently running an old and new tserver for rolling restart.  It was too complex.   I updated the design doc w/ a different approach for rolling upgrade that tries to minimize down time for reads.

Thanks Josh.


Bugs: ACCUMULO-1454
    https://issues.apache.org/jira/browse/ACCUMULO-1454


Repository: accumulo


Description
-------

Positing ACCUMULO-1454 design doc for review


Diffs (updated)
-----

  docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 

Diff: https://reviews.apache.org/r/24855/diff/


Testing
-------


Thanks,

kturner


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.

> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 88
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line88>
> >
> >     If a user is programming to this API, how do they know what tservers are available? Shouldn't there be a getTabletServers() method as well?
> >     
> >     Also, it would be better to return a concrete class instead of Iterable (since we'd likely be backing it by some List). Advertise what we're actaully returning, and let the user treat it as an Iterable if they so choose.
> 
> kturner wrote:
>     the following tablet server related methods already exist in instance operations. 
>      
>      List<String> getTabletServers();
>      List<ActiveScan> getActiveScans(String tserver) throws AccumuloException, AccumuloSecurityException;
>      List<ActiveCompaction> getActiveCompactions(String tserver) throws AccumuloException, AccumuloSecurityException;
>      void ping(String tserver) throws AccumuloException;
>      
>      Using List would be consistent w/ rest of API.  I was think that Iterable would allow it be backed by a scanner over the metadata table.   Was also thinking that something like 
>      
>      ```loadTablets(getTablets("1.2.3.4:9997"), "1.2.3.4:9993")```
>      
>      does not have to load complete list of tablet into memory.  That may not be a worthwhile goal.

API consistency would be best. As long as the results an API call to get TServers can be used with the proposed new methods, I'm content.

Backing results with a Scanner can be nice, but dealing with concrete structures can also be nice. Kind of have to guess the likelihood that the user is just going to wrap the Iterable in a List/Set anyways (can probably punt on a decision until you actually write code which uses the API).


> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 105
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line105>
> >
> >     If you're providing an unloadTablets method, I would think calling loadTablets on a tablet that is already loaded should throw an Exception, not unload it for you.
> 
> kturner wrote:
>     My thinking was that load tablets would load on the specified tserver, irrespective of the tablets current status.

That means that you would be in favor of an already loaded tablet on another tserver unloading itself first (which means its essentially a move)? Or are you implying that the requested load of an already assigned tablet would implicitly fail because that operation is impossible?


> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 111
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line111>
> >
> >     I'd lean towards keeping KeyExtent out of user's eyesight.
> 
> kturner wrote:
>     Thats what I am thinking.  I just checked where its used in API.  MutationsRejectedException, ActiveScan, and ActiveCompaction use it.

The more I thought about it, as long as we clean it up, it's not the worst. The weird part would be pushing the tablet identifier notation ('1;b;a') information into the "user's realm" which makes me a little queasy. I'd rather have something more consumable for them.


> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 115
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line115>
> >
> >     These are going to be coming off of a connector or ZKI, right? I would treat the instance id as implied (not required as an argument). host+port sounds good, but how do you distinguish between localhost, 127.0.0.1, the FQDN and the external IP (if there aren't many)?
> 
> kturner wrote:
>     I suppose the expectation w/ the current methods that take tserver as an argument is that you will use something that came from getTabletServers().  I thik the string that comes from that method is host+port, but not positive.

Could always standardize on the guava HostAndPort since that's what TServerUtils is doing under the hoods anyways.


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review50998
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.

> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 20
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line20>
> >
> >     More specifics on "have a new config and run on a different port" would be desirable.
> >     
> >     On each tserver host, you have tserver1 with conf1. You copy conf1 to conf2, make edits, set ACCUMULO_CONF_DIR in accumulo-env.sh (?), start tserver2 (running with conf2), do whatever tablet migration magic, stop tserver1.
> >     
> >     You're now left with tserver2 running on different ports than you started with and different config dirs. Do you then have to go back and modify conf1 to match conf2 (sans the ports) and start tserver1 and stop tserver2? Is there a simpler way to encapsulate this?

Good point.  This needs to be considered in more detail as it may invalidate this design.


> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 88
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line88>
> >
> >     If a user is programming to this API, how do they know what tservers are available? Shouldn't there be a getTabletServers() method as well?
> >     
> >     Also, it would be better to return a concrete class instead of Iterable (since we'd likely be backing it by some List). Advertise what we're actaully returning, and let the user treat it as an Iterable if they so choose.

the following tablet server related methods already exist in instance operations. 
 
 List<String> getTabletServers();
 List<ActiveScan> getActiveScans(String tserver) throws AccumuloException, AccumuloSecurityException;
 List<ActiveCompaction> getActiveCompactions(String tserver) throws AccumuloException, AccumuloSecurityException;
 void ping(String tserver) throws AccumuloException;
 
 Using List would be consistent w/ rest of API.  I was think that Iterable would allow it be backed by a scanner over the metadata table.   Was also thinking that something like 
 
 ```loadTablets(getTablets("1.2.3.4:9997"), "1.2.3.4:9993")```
 
 does not have to load complete list of tablet into memory.  That may not be a worthwhile goal.


> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 100
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line100>
> >
> >     Does avoiding explicitly providing moveTablet semantics avoid the need for unloadTablets and loadTablets to be FATE ops?

I don't think so, but I was not thinking too much about implementation yet since I am not certain about the concept yet.


> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 105
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line105>
> >
> >     If you're providing an unloadTablets method, I would think calling loadTablets on a tablet that is already loaded should throw an Exception, not unload it for you.

My thinking was that load tablets would load on the specified tserver, irrespective of the tablets current status.


> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 111
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line111>
> >
> >     I'd lean towards keeping KeyExtent out of user's eyesight.

Thats what I am thinking.  I just checked where its used in API.  MutationsRejectedException, ActiveScan, and ActiveCompaction use it.


> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 115
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line115>
> >
> >     These are going to be coming off of a connector or ZKI, right? I would treat the instance id as implied (not required as an argument). host+port sounds good, but how do you distinguish between localhost, 127.0.0.1, the FQDN and the external IP (if there aren't many)?

I suppose the expectation w/ the current methods that take tserver as an argument is that you will use something that came from getTabletServers().  I thik the string that comes from that method is host+port, but not positive.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review50998
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.

> On Aug. 19, 2014, 6:21 p.m., Josh Elser wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 105
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line105>
> >
> >     If you're providing an unloadTablets method, I would think calling loadTablets on a tablet that is already loaded should throw an Exception, not unload it for you.
> 
> kturner wrote:
>     My thinking was that load tablets would load on the specified tserver, irrespective of the tablets current status.
> 
> Josh Elser wrote:
>     That means that you would be in favor of an already loaded tablet on another tserver unloading itself first (which means its essentially a move)? Or are you implying that the requested load of an already assigned tablet would implicitly fail because that operation is impossible?

For the case where its loaded on another tserver, I was thinking unload and then load.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review50998
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by Josh Elser <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review50998
-----------------------------------------------------------



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment88887>

    More specifics on "have a new config and run on a different port" would be desirable.
    
    On each tserver host, you have tserver1 with conf1. You copy conf1 to conf2, make edits, set ACCUMULO_CONF_DIR in accumulo-env.sh (?), start tserver2 (running with conf2), do whatever tablet migration magic, stop tserver1.
    
    You're now left with tserver2 running on different ports than you started with and different config dirs. Do you then have to go back and modify conf1 to match conf2 (sans the ports) and start tserver1 and stop tserver2? Is there a simpler way to encapsulate this?



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment88878>

    If a user is programming to this API, how do they know what tservers are available? Shouldn't there be a getTabletServers() method as well?
    
    Also, it would be better to return a concrete class instead of Iterable (since we'd likely be backing it by some List). Advertise what we're actaully returning, and let the user treat it as an Iterable if they so choose.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment88888>

    Does avoiding explicitly providing moveTablet semantics avoid the need for unloadTablets and loadTablets to be FATE ops?



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment88883>

    If you're providing an unloadTablets method, I would think calling loadTablets on a tablet that is already loaded should throw an Exception, not unload it for you.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment88881>

    I'd lean towards keeping KeyExtent out of user's eyesight.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment88882>

    These are going to be coming off of a connector or ZKI, right? I would treat the instance id as implied (not required as an argument). host+port sounds good, but how do you distinguish between localhost, 127.0.0.1, the FQDN and the external IP (if there aren't many)?


- Josh Elser


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.

> On Aug. 19, 2014, 5:54 p.m., kturner wrote:
> > docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc, line 15
> > <https://reviews.apache.org/r/24855/diff/1/?file=664453#file664453line15>
> >
> >     Should also mention live bug fix upgrade use case.  For example upgrading from 1.7.0. to 1.7.1 while Accumulo is running.

Mike Drob asked about this on the issue.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review50999
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review50999
-----------------------------------------------------------



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment88880>

    This is a partial solution.   It provides the primitives for a complete solution.



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment88879>

    Should also mention live bug fix upgrade use case.  For example upgrading from 1.7.0. to 1.7.1 while Accumulo is running.


- kturner


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51003
-----------------------------------------------------------



docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc
<https://reviews.apache.org/r/24855/#comment88886>

    In Jira Mike Drob asked "Why load/unload instead of move?"
    
    Splitting them gives more flexibility.  For example can leave a tablet unloaded (if assignment is disabled).  Also, loadTablets will load tablets even if they are not loaded anywhere.


- kturner


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Re: Review Request 24855: ACCUMULO-1454 design doc

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/
-----------------------------------------------------------

(Updated Aug. 19, 2014, 5:50 p.m.)


Review request for accumulo.


Bugs: ACCUMULO-1454
    https://issues.apache.org/jira/browse/ACCUMULO-1454


Repository: accumulo


Description
-------

Positing ACCUMULO-1454 design doc for review


Diffs
-----

  docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 

Diff: https://reviews.apache.org/r/24855/diff/


Testing
-------


Thanks,

kturner