You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Noe Detore <nd...@minerkasch.com> on 2017/01/11 17:44:01 UTC
Replication Latency
Hello,
I trying to influence replication latency with tserver.walog.max.age. But
noticing no difference when setting the value low. Looking in the code of
org.apache.accumulo.tserver.log.TabletServerLogger:
protected void closeForReplication(Collection<CommitSession> sessions) {
// TODO We can close the WAL here for replication purposes
}
This to do is called by :
testLockAndRun(logSetLock, new TestCallWithWriteLock() {
@Override
boolean test() {
return (logSizeEstimate.get() > maxSize) ||
((System.currentTimeMillis() - createTime) > maxAge);
}
@Override
void withWriteLock() throws IOException {
close();
closeForReplication(sessions);
}
});
return seq;
}
I am still trying to understand what is happening here, but could this TODO
be the reason replication status records are not being updated with
'closed: true' sooner ?
Thank you
Noe
Re: Unsubscribe
Posted by Josh Elser <el...@apache.org>.
See http://accumulo.apache.org/mailing_list/
You unsubscribe yourself just like you subscribed yourself.
Paul Tremblett wrote:
Unsubscribe
Posted by Paul Tremblett <pt...@swva.net>.
Re: Replication Latency
Posted by Noe Detore <nd...@minerkasch.com>.
org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences is what I
needed to find
thank you
On Wed, Jan 11, 2017 at 8:02 PM, Josh Elser <jo...@gmail.com> wrote:
> Did you look at the accumulo-gc log to actually correlate how often the
> class I sent is being executed?
>
> Noe Detore wrote:
>
>> To be fare, after writing the post I grepped the logs and found my WALs
>> rolling over on size before the time max.age threshold was hit. That is
>> the reason I did not see improvement in latency based on adjustment by
>> reducing the max.age.
>>
>> There is still an x factor from when a WAL is no longer written to by
>> the tserver as to when it actually gets replicated that I need to figure
>> out. For example my WALs appear to done(new wal created on tserver)
>> being written to in 3m, but replication is taking about 12 to 15 min to
>> complete. Even though the wal is not being written to after 3m I am not
>> seeing it ready for replication (closed: true) until after 13m.
>>
>>
>> On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <josh.elser@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences
>> for where WALs are currently marked as "closed".
>>
>> I don't recall the details, but I think there was some issue with
>> trying to close them in TabletServerLogger.
>>
>> Yes to your last question: if it were done in TabletServerLogger, it
>> would be closed more quickly than done by the GC. The issue is
>> whether or not it's actually safe to mark them as closed there. I
>> just don't remember the internal WAL lifecycle well enough.
>>
>>
>> Noe Detore wrote:
>>
>> Hello,
>>
>> I trying to influence replication latency with
>> tserver.walog.max.age.
>> But noticing no difference when setting the value low. Looking
>> in the
>> code of org.apache.accumulo.tserver.log.TabletServerLogger:
>>
>> protected void closeForReplication(Collection<CommitSession>
>> sessions) {
>> // TODO We can close the WAL here for replication purposes
>> }
>>
>> This to do is called by :
>> testLockAndRun(logSetLock, new TestCallWithWriteLock() {
>> @Override
>> boolean test() {
>> return (logSizeEstimate.get() > maxSize) ||
>> ((System.currentTimeMillis() - createTime) > maxAge);
>> }
>>
>> @Override
>> void withWriteLock() throws IOException {
>> close();
>> closeForReplication(sessions);
>> }
>> });
>> return seq;
>> }
>>
>> I am still trying to understand what is happening here, but
>> could this
>> TODO be the reason replication status records are not being
>> updated with
>> 'closed: true' sooner ?
>>
>> Thank you
>> Noe
>>
>>
>>
Re: Replication Latency
Posted by Josh Elser <jo...@gmail.com>.
Did you look at the accumulo-gc log to actually correlate how often the
class I sent is being executed?
Noe Detore wrote:
> To be fare, after writing the post I grepped the logs and found my WALs
> rolling over on size before the time max.age threshold was hit. That is
> the reason I did not see improvement in latency based on adjustment by
> reducing the max.age.
>
> There is still an x factor from when a WAL is no longer written to by
> the tserver as to when it actually gets replicated that I need to figure
> out. For example my WALs appear to done(new wal created on tserver)
> being written to in 3m, but replication is taking about 12 to 15 min to
> complete. Even though the wal is not being written to after 3m I am not
> seeing it ready for replication (closed: true) until after 13m.
>
>
> On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
> See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences
> for where WALs are currently marked as "closed".
>
> I don't recall the details, but I think there was some issue with
> trying to close them in TabletServerLogger.
>
> Yes to your last question: if it were done in TabletServerLogger, it
> would be closed more quickly than done by the GC. The issue is
> whether or not it's actually safe to mark them as closed there. I
> just don't remember the internal WAL lifecycle well enough.
>
>
> Noe Detore wrote:
>
> Hello,
>
> I trying to influence replication latency with
> tserver.walog.max.age.
> But noticing no difference when setting the value low. Looking
> in the
> code of org.apache.accumulo.tserver.log.TabletServerLogger:
>
> protected void closeForReplication(Collection<CommitSession>
> sessions) {
> // TODO We can close the WAL here for replication purposes
> }
>
> This to do is called by :
> testLockAndRun(logSetLock, new TestCallWithWriteLock() {
> @Override
> boolean test() {
> return (logSizeEstimate.get() > maxSize) ||
> ((System.currentTimeMillis() - createTime) > maxAge);
> }
>
> @Override
> void withWriteLock() throws IOException {
> close();
> closeForReplication(sessions);
> }
> });
> return seq;
> }
>
> I am still trying to understand what is happening here, but
> could this
> TODO be the reason replication status records are not being
> updated with
> 'closed: true' sooner ?
>
> Thank you
> Noe
>
>
Re: Replication Latency
Posted by Noe Detore <nd...@minerkasch.com>.
To be fare, after writing the post I grepped the logs and found my WALs
rolling over on size before the time max.age threshold was hit. That is the
reason I did not see improvement in latency based on adjustment by reducing
the max.age.
There is still an x factor from when a WAL is no longer written to by the
tserver as to when it actually gets replicated that I need to figure out.
For example my WALs appear to done(new wal created on tserver) being
written to in 3m, but replication is taking about 12 to 15 min to complete.
Even though the wal is not being written to after 3m I am not seeing it
ready for replication (closed: true) until after 13m.
On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <jo...@gmail.com> wrote:
> See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences for
> where WALs are currently marked as "closed".
>
> I don't recall the details, but I think there was some issue with trying
> to close them in TabletServerLogger.
>
> Yes to your last question: if it were done in TabletServerLogger, it would
> be closed more quickly than done by the GC. The issue is whether or not
> it's actually safe to mark them as closed there. I just don't remember the
> internal WAL lifecycle well enough.
>
>
> Noe Detore wrote:
>
>> Hello,
>>
>> I trying to influence replication latency with tserver.walog.max.age.
>> But noticing no difference when setting the value low. Looking in the
>> code of org.apache.accumulo.tserver.log.TabletServerLogger:
>>
>> protected void closeForReplication(Collection<CommitSession> sessions) {
>> // TODO We can close the WAL here for replication purposes
>> }
>>
>> This to do is called by :
>> testLockAndRun(logSetLock, new TestCallWithWriteLock() {
>> @Override
>> boolean test() {
>> return (logSizeEstimate.get() > maxSize) ||
>> ((System.currentTimeMillis() - createTime) > maxAge);
>> }
>>
>> @Override
>> void withWriteLock() throws IOException {
>> close();
>> closeForReplication(sessions);
>> }
>> });
>> return seq;
>> }
>>
>> I am still trying to understand what is happening here, but could this
>> TODO be the reason replication status records are not being updated with
>> 'closed: true' sooner ?
>>
>> Thank you
>> Noe
>>
>
Re: Replication Latency
Posted by Josh Elser <jo...@gmail.com>.
See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences for
where WALs are currently marked as "closed".
I don't recall the details, but I think there was some issue with trying
to close them in TabletServerLogger.
Yes to your last question: if it were done in TabletServerLogger, it
would be closed more quickly than done by the GC. The issue is whether
or not it's actually safe to mark them as closed there. I just don't
remember the internal WAL lifecycle well enough.
Noe Detore wrote:
> Hello,
>
> I trying to influence replication latency with tserver.walog.max.age.
> But noticing no difference when setting the value low. Looking in the
> code of org.apache.accumulo.tserver.log.TabletServerLogger:
>
> protected void closeForReplication(Collection<CommitSession> sessions) {
> // TODO We can close the WAL here for replication purposes
> }
>
> This to do is called by :
> testLockAndRun(logSetLock, new TestCallWithWriteLock() {
> @Override
> boolean test() {
> return (logSizeEstimate.get() > maxSize) ||
> ((System.currentTimeMillis() - createTime) > maxAge);
> }
>
> @Override
> void withWriteLock() throws IOException {
> close();
> closeForReplication(sessions);
> }
> });
> return seq;
> }
>
> I am still trying to understand what is happening here, but could this
> TODO be the reason replication status records are not being updated with
> 'closed: true' sooner ?
>
> Thank you
> Noe