You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Noe Detore <nd...@minerkasch.com> on 2017/01/11 17:44:01 UTC

Replication Latency

Hello,

I trying to influence replication latency with tserver.walog.max.age.  But
noticing no difference when setting the value low. Looking in the code of
org.apache.accumulo.tserver.log.TabletServerLogger:

protected void closeForReplication(Collection<CommitSession> sessions) {
   // TODO We can close the WAL here for replication purposes
 }

This to do is called by :
testLockAndRun(logSetLock, new TestCallWithWriteLock() {
     @Override
     boolean test() {
       return (logSizeEstimate.get() > maxSize) ||
((System.currentTimeMillis() - createTime) > maxAge);
     }

     @Override
     void withWriteLock() throws IOException {
       close();
       closeForReplication(sessions);
     }
   });
   return seq;
 }

I am still trying to understand what is happening here, but could this TODO
be the reason replication status records are not being updated with
'closed: true' sooner ?

Thank you
Noe

Re: Unsubscribe

Posted by Josh Elser <el...@apache.org>.

See http://accumulo.apache.org/mailing_list/

You unsubscribe yourself just like you subscribed yourself.

Paul Tremblett wrote:

Unsubscribe

Posted by Paul Tremblett <pt...@swva.net>.

Re: Replication Latency

Posted by Noe Detore <nd...@minerkasch.com>.

 org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences is what I
needed to find

thank you

On Wed, Jan 11, 2017 at 8:02 PM, Josh Elser <jo...@gmail.com> wrote:

> Did you look at the accumulo-gc log to actually correlate how often the
> class I sent is being executed?
>
> Noe Detore wrote:
>
>> To be fare, after writing the post I grepped the logs and found my WALs
>> rolling over on size before the time max.age threshold was hit. That is
>> the reason I did not see improvement in latency based on adjustment by
>> reducing the max.age.
>>
>> There is still an x factor from when a WAL is no longer written to by
>> the tserver as to when it actually gets replicated that I need to figure
>> out. For example my WALs appear to done(new wal created on tserver)
>> being written to in 3m, but replication is taking about 12 to 15 min to
>> complete. Even though the wal is not being written to after 3m I am not
>> seeing it ready for replication (closed: true) until after 13m.
>>
>>
>> On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <josh.elser@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences
>>     for where WALs are currently marked as "closed".
>>
>>     I don't recall the details, but I think there was some issue with
>>     trying to close them in TabletServerLogger.
>>
>>     Yes to your last question: if it were done in TabletServerLogger, it
>>     would be closed more quickly than done by the GC. The issue is
>>     whether or not it's actually safe to mark them as closed there. I
>>     just don't remember the internal WAL lifecycle well enough.
>>
>>
>>     Noe Detore wrote:
>>
>>         Hello,
>>
>>         I trying to influence replication latency with
>>         tserver.walog.max.age.
>>         But noticing no difference when setting the value low. Looking
>>         in the
>>         code of org.apache.accumulo.tserver.log.TabletServerLogger:
>>
>>         protected void closeForReplication(Collection<CommitSession>
>>         sessions) {
>>              // TODO We can close the WAL here for replication purposes
>>            }
>>
>>         This to do is called by :
>>         testLockAndRun(logSetLock, new TestCallWithWriteLock() {
>>                @Override
>>                boolean test() {
>>                  return (logSizeEstimate.get() > maxSize) ||
>>         ((System.currentTimeMillis() - createTime) > maxAge);
>>                }
>>
>>                @Override
>>                void withWriteLock() throws IOException {
>>                  close();
>>                  closeForReplication(sessions);
>>                }
>>              });
>>              return seq;
>>            }
>>
>>         I am still trying to understand what is happening here, but
>>         could this
>>         TODO be the reason replication status records are not being
>>         updated with
>>         'closed: true' sooner ?
>>
>>         Thank you
>>         Noe
>>
>>
>>

Re: Replication Latency

Posted by Josh Elser <jo...@gmail.com>.

Did you look at the accumulo-gc log to actually correlate how often the 
class I sent is being executed?

Noe Detore wrote:
> To be fare, after writing the post I grepped the logs and found my WALs
> rolling over on size before the time max.age threshold was hit. That is
> the reason I did not see improvement in latency based on adjustment by
> reducing the max.age.
>
> There is still an x factor from when a WAL is no longer written to by
> the tserver as to when it actually gets replicated that I need to figure
> out. For example my WALs appear to done(new wal created on tserver)
> being written to in 3m, but replication is taking about 12 to 15 min to
> complete. Even though the wal is not being written to after 3m I am not
> seeing it ready for replication (closed: true) until after 13m.
>
>
> On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences
>     for where WALs are currently marked as "closed".
>
>     I don't recall the details, but I think there was some issue with
>     trying to close them in TabletServerLogger.
>
>     Yes to your last question: if it were done in TabletServerLogger, it
>     would be closed more quickly than done by the GC. The issue is
>     whether or not it's actually safe to mark them as closed there. I
>     just don't remember the internal WAL lifecycle well enough.
>
>
>     Noe Detore wrote:
>
>         Hello,
>
>         I trying to influence replication latency with
>         tserver.walog.max.age.
>         But noticing no difference when setting the value low. Looking
>         in the
>         code of org.apache.accumulo.tserver.log.TabletServerLogger:
>
>         protected void closeForReplication(Collection<CommitSession>
>         sessions) {
>              // TODO We can close the WAL here for replication purposes
>            }
>
>         This to do is called by :
>         testLockAndRun(logSetLock, new TestCallWithWriteLock() {
>                @Override
>                boolean test() {
>                  return (logSizeEstimate.get() > maxSize) ||
>         ((System.currentTimeMillis() - createTime) > maxAge);
>                }
>
>                @Override
>                void withWriteLock() throws IOException {
>                  close();
>                  closeForReplication(sessions);
>                }
>              });
>              return seq;
>            }
>
>         I am still trying to understand what is happening here, but
>         could this
>         TODO be the reason replication status records are not being
>         updated with
>         'closed: true' sooner ?
>
>         Thank you
>         Noe
>
>

Re: Replication Latency

Posted by Noe Detore <nd...@minerkasch.com>.

To be fare, after writing the post I grepped the logs and found my WALs
rolling over on size before the time max.age threshold was hit. That is the
reason I did not see improvement in latency based on adjustment by reducing
the max.age.

There is still an x factor from when a WAL is no longer written to by the
tserver as to when it actually gets replicated that I need to figure out.
For example my WALs appear to done(new wal created on tserver) being
written to in 3m, but replication is taking about 12 to 15 min to complete.
Even though the wal is not being written to after 3m I am not seeing it
ready for replication (closed: true) until after 13m.

On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <jo...@gmail.com> wrote:

> See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences for
> where WALs are currently marked as "closed".
>
> I don't recall the details, but I think there was some issue with trying
> to close them in TabletServerLogger.
>
> Yes to your last question: if it were done in TabletServerLogger, it would
> be closed more quickly than done by the GC. The issue is whether or not
> it's actually safe to mark them as closed there. I just don't remember the
> internal WAL lifecycle well enough.
>
>
> Noe Detore wrote:
>
>> Hello,
>>
>> I trying to influence replication latency with tserver.walog.max.age.
>> But noticing no difference when setting the value low. Looking in the
>> code of org.apache.accumulo.tserver.log.TabletServerLogger:
>>
>> protected void closeForReplication(Collection<CommitSession> sessions) {
>>     // TODO We can close the WAL here for replication purposes
>>   }
>>
>> This to do is called by :
>> testLockAndRun(logSetLock, new TestCallWithWriteLock() {
>>       @Override
>>       boolean test() {
>>         return (logSizeEstimate.get() > maxSize) ||
>> ((System.currentTimeMillis() - createTime) > maxAge);
>>       }
>>
>>       @Override
>>       void withWriteLock() throws IOException {
>>         close();
>>         closeForReplication(sessions);
>>       }
>>     });
>>     return seq;
>>   }
>>
>> I am still trying to understand what is happening here, but could this
>> TODO be the reason replication status records are not being updated with
>> 'closed: true' sooner ?
>>
>> Thank you
>> Noe
>>
>

Re: Replication Latency

Posted by Josh Elser <jo...@gmail.com>.

See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences for 
where WALs are currently marked as "closed".

I don't recall the details, but I think there was some issue with trying 
to close them in TabletServerLogger.

Yes to your last question: if it were done in TabletServerLogger, it 
would be closed more quickly than done by the GC. The issue is whether 
or not it's actually safe to mark them as closed there. I just don't 
remember the internal WAL lifecycle well enough.

Noe Detore wrote:
> Hello,
>
> I trying to influence replication latency with tserver.walog.max.age.
> But noticing no difference when setting the value low. Looking in the
> code of org.apache.accumulo.tserver.log.TabletServerLogger:
>
> protected void closeForReplication(Collection<CommitSession> sessions) {
>     // TODO We can close the WAL here for replication purposes
>   }
>
> This to do is called by :
> testLockAndRun(logSetLock, new TestCallWithWriteLock() {
>       @Override
>       boolean test() {
>         return (logSizeEstimate.get() > maxSize) ||
> ((System.currentTimeMillis() - createTime) > maxAge);
>       }
>
>       @Override
>       void withWriteLock() throws IOException {
>         close();
>         closeForReplication(sessions);
>       }
>     });
>     return seq;
>   }
>
> I am still trying to understand what is happening here, but could this
> TODO be the reason replication status records are not being updated with
> 'closed: true' sooner ?
>
> Thank you
> Noe