You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Ed Coleman <de...@etcoleman.com> on 2015/01/31 16:47:09 UTC

Concern regarding cache (ACCUMULO-3549 and ACCUMULO-3547) in Apache Accumulo 1.6.2 RC3

Eric commented on the vote for RC3:

- - - -
It would be nice to have ACCUMULO-3547 <https://issues.apache.org/jira/browse/ACCUMULO-3549> in 1.6.2.

We are running at scale with it at the moment, and it has made a huge improvement.  I hate to hold up 1.6.2, though.  If it doesn't make it, please update the ticket to point to 1.6.3.
- - - -

I generally agree with this and it seems that ACCUMULO-3547 will make it into 1.6.2 - which I think is the preferable option. My concerns deal with not having ACCUMULO-3549 included in 1.6.2 too.

In ACCUMULO-3549 Keith made the assumption that end rows are 10 bytes - I'm not sure this is a good assumption. If end rows are larger than 10 bytes, then how much more memory will be required over time? How much faster will it grow?

Without ACCUMULO-3549, what are my options for monitoring / correcting the situation if the cache grows too large? Will tablet server performance slowly degrade over time because the cache keeps growing?  What will users need to do to monitor and then correct this? Will we be in a situation where tserevrs will start to run out of memory, we will increase the memory allocation if we can, and just kick the can down the road a little further and performance will just keep degrading?

Is there a way to trigger the cache to clear short of restarting a tserver? While not optimal, having a utility / script that slowly walks across the tservers and clears the cache so that each tserver cache is cleared every 12, 24, 48,... hours may be a bridge until ACCUMULO-3549 is resolved. If this is the case, it would seem that having the fix in 1.6.3 would also be a priority. 

Maybe this has been discussed and resolved, but I want to bring this up to ensure that the ramifications have been considered and that there is a viable mitigation strategy that is communicated to the users. Sorry for the doom - end of the world tone I was just trying to emphasis the worst case scenarios that I could envision. I think ACCUMULO-3547 is an important (even necessary improvement) and I'm not suggesting that it be removed - I just want to make sure that I understand the other side effects and know our options.

Ed Coleman



Re: Concern regarding cache (ACCUMULO-3549 and ACCUMULO-3547) in Apache Accumulo 1.6.2 RC3

Posted by Eric Newton <er...@gmail.com>.
We're working ACCUMULO-3549, and a pretty conservative fix will be
committed Monday.


On Sat, Jan 31, 2015 at 12:48 PM, Josh Elser <jo...@gmail.com> wrote:

> That's a good point, Ed, and there hasn't been any other discussion (on
> the mailing lists) so you did the right thing bringing this up here.
>
> There is no user administration or monitoring support that would allow
> user intervention (aside from restarting a tserver which is a no-go). If
> we're going to include it, like it appears so, we need to both make sure
> that the cache is bounded in size and we have as many people as possible
> look at it (since it's such a late addition to the release -- it's common
> for us to only notice subtleties weeks to months after a change is made
> during normal development cycles).
>
>
> Ed Coleman wrote:
>
>> Eric commented on the vote for RC3:
>>
>> - - - -
>> It would be nice to have ACCUMULO-3547<https://issues.
>> apache.org/jira/browse/ACCUMULO-3549>  in 1.6.2.
>>
>> We are running at scale with it at the moment, and it has made a huge
>> improvement.  I hate to hold up 1.6.2, though.  If it doesn't make it,
>> please update the ticket to point to 1.6.3.
>> - - - -
>>
>> I generally agree with this and it seems that ACCUMULO-3547 will make it
>> into 1.6.2 - which I think is the preferable option. My concerns deal with
>> not having ACCUMULO-3549 included in 1.6.2 too.
>>
>> In ACCUMULO-3549 Keith made the assumption that end rows are 10 bytes -
>> I'm not sure this is a good assumption. If end rows are larger than 10
>> bytes, then how much more memory will be required over time? How much
>> faster will it grow?
>>
>> Without ACCUMULO-3549, what are my options for monitoring / correcting
>> the situation if the cache grows too large? Will tablet server performance
>> slowly degrade over time because the cache keeps growing?  What will users
>> need to do to monitor and then correct this? Will we be in a situation
>> where tserevrs will start to run out of memory, we will increase the memory
>> allocation if we can, and just kick the can down the road a little further
>> and performance will just keep degrading?
>>
>> Is there a way to trigger the cache to clear short of restarting a
>> tserver? While not optimal, having a utility / script that slowly walks
>> across the tservers and clears the cache so that each tserver cache is
>> cleared every 12, 24, 48,... hours may be a bridge until ACCUMULO-3549 is
>> resolved. If this is the case, it would seem that having the fix in 1.6.3
>> would also be a priority.
>>
>> Maybe this has been discussed and resolved, but I want to bring this up
>> to ensure that the ramifications have been considered and that there is a
>> viable mitigation strategy that is communicated to the users. Sorry for the
>> doom - end of the world tone I was just trying to emphasis the worst case
>> scenarios that I could envision. I think ACCUMULO-3547 is an important
>> (even necessary improvement) and I'm not suggesting that it be removed - I
>> just want to make sure that I understand the other side effects and know
>> our options.
>>
>> Ed Coleman
>>
>>
>>

Re: Concern regarding cache (ACCUMULO-3549 and ACCUMULO-3547) in Apache Accumulo 1.6.2 RC3

Posted by Josh Elser <jo...@gmail.com>.
That's a good point, Ed, and there hasn't been any other discussion (on 
the mailing lists) so you did the right thing bringing this up here.

There is no user administration or monitoring support that would allow 
user intervention (aside from restarting a tserver which is a no-go). If 
we're going to include it, like it appears so, we need to both make sure 
that the cache is bounded in size and we have as many people as possible 
look at it (since it's such a late addition to the release -- it's 
common for us to only notice subtleties weeks to months after a change 
is made during normal development cycles).

Ed Coleman wrote:
> Eric commented on the vote for RC3:
>
> - - - -
> It would be nice to have ACCUMULO-3547<https://issues.apache.org/jira/browse/ACCUMULO-3549>  in 1.6.2.
>
> We are running at scale with it at the moment, and it has made a huge improvement.  I hate to hold up 1.6.2, though.  If it doesn't make it, please update the ticket to point to 1.6.3.
> - - - -
>
> I generally agree with this and it seems that ACCUMULO-3547 will make it into 1.6.2 - which I think is the preferable option. My concerns deal with not having ACCUMULO-3549 included in 1.6.2 too.
>
> In ACCUMULO-3549 Keith made the assumption that end rows are 10 bytes - I'm not sure this is a good assumption. If end rows are larger than 10 bytes, then how much more memory will be required over time? How much faster will it grow?
>
> Without ACCUMULO-3549, what are my options for monitoring / correcting the situation if the cache grows too large? Will tablet server performance slowly degrade over time because the cache keeps growing?  What will users need to do to monitor and then correct this? Will we be in a situation where tserevrs will start to run out of memory, we will increase the memory allocation if we can, and just kick the can down the road a little further and performance will just keep degrading?
>
> Is there a way to trigger the cache to clear short of restarting a tserver? While not optimal, having a utility / script that slowly walks across the tservers and clears the cache so that each tserver cache is cleared every 12, 24, 48,... hours may be a bridge until ACCUMULO-3549 is resolved. If this is the case, it would seem that having the fix in 1.6.3 would also be a priority.
>
> Maybe this has been discussed and resolved, but I want to bring this up to ensure that the ramifications have been considered and that there is a viable mitigation strategy that is communicated to the users. Sorry for the doom - end of the world tone I was just trying to emphasis the worst case scenarios that I could envision. I think ACCUMULO-3547 is an important (even necessary improvement) and I'm not suggesting that it be removed - I just want to make sure that I understand the other side effects and know our options.
>
> Ed Coleman
>
>