You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Demai Ni <ni...@gmail.com> on 2014/07/30 00:48:51 UTC

meaning for AgeOfLastAppliedOp in Replication MetricsSink

hi,

A quick question to clarify this  AgeOfLastAppliedOp in MetricsSink.java. I
assume it is used as an indicator about how long for a Sink OP to be
applied; but instead, it is more like to show how long since the last Sink
OP applied

  /**
   * Set the age of the last applied operation
   *
   * @param timestamp The timestamp of the last operation applied.
   * @return the age that was set
   */
  public long setAgeOfLastAppliedOp(long timestamp) {
    lastTimestampForAge = timestamp;
    long age = System.currentTimeMillis() - lastTimestampForAge;
    rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
    return age;
  }

In the following scenario:
1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
set for example 100ms;
2) and then NO new Sink op occur.
3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, which
doesn't make sense, right?

should we put a check for (lastTimestampForAge != timestamp) before refresh
the age?

Thanks

Demai

Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink

Posted by Demai Ni <ni...@gmail.com>.
Lars,

But I see you're looking at 0.98. There indeed we are called
> refreshAgeOfLastAppliedOp every time we call getStats(), which would
> increase that metric even when there is nothing to ship. That looks like a
> bug.
>

yeah. that is what I am thinking. Let's have a quick fix on it. I will
provide a patch later today. (was working on HBASE-9531 when the problem
surfaces).

Demai


On Wed, Jul 30, 2014 at 10:41 AM, lars hofhansl <la...@apache.org> wrote:

> I was also confusing ageOfLastShippedEdit with ageOfLastAppliedEdit. I had
> fixed ageOfLastShippedEdit on the source in 0.94.
> Looks like 0.94 is doing the right thing with ageOfLastAppliedEdit, but
> 0.98+ is not.
>
> -- Lars
>
>   ------------------------------
>  *From:* lars hofhansl <la...@apache.org>
> *To:* Demai Ni <ni...@gmail.com>; "dev@hbase.apache.org" <
> dev@hbase.apache.org>
> *Sent:* Wednesday, July 30, 2014 10:30 AM
>
> *Subject:* Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink
>
> > When the  'the time an edit entered the system' doesn't change (in the
> case
> of  no Sink Op enter for a period of time), the age will keep growing
> since current time moving forward, which gives a false impression that
> an edit sitting in the queue for very long time. isn't it?
>
>
> I think until the item is shipped it should be counted as waiting. I.e.
> the time this reports is the time between an edit entered the system and
> the time it finally gets shipped to the replication
> sink.refreshAgeOfLastAppliedOp() should only be called when something is
> actually being shipped, not periodically (I just fixed that in 0.94
> HBASE-11143).
>
>
>
> But I see you're looking at 0.98. There indeed we are called
> refreshAgeOfLastAppliedOp every time we call getStats(), which would
> increase that metric even when there is nothing to ship. That looks like a
> bug.
>
> -- Lars
>
>
>
> ________________________________
>
> From: Demai Ni <ni...@gmail.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> larsh@apache.org>
> Sent: Wednesday, July 30, 2014 8:49 AM
> Subject: Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink
>
>
>
> Lars, thanks for your input.
>
>
> This metric indicates the time an edit sat in the "replication queue"
> before it got replicated.
> yeah, I am with you on this.
>
>
> With that definition it is doing the right thing: Reporting current time -
> the time an edit entered the system (it's WAL time)
> >
> When the  'the time an edit entered the system' doesn't change (in the
> case of  no Sink Op enter for a period of time), the age will keep growing
> since current time moving forward, which gives a false impression that an
> edit sitting in the queue for very long time. isn't it?
>
>
>
>
> On Tue, Jul 29, 2014 at 10:54 PM, lars hofhansl <la...@apache.org> wrote:
>
> This metric indicates the time an edit sat in the "replication queue"
> before it got replicated.
> >With that definition it is doing the right thing: Reporting current time
> - the time an edit entered the system (it's WAL time)
> >
> >
> >-- Lars
> >
> >
> >
> >________________________________
> > From: Demai Ni <ni...@gmail.com>
> >To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> >Sent: Tuesday, July 29, 2014 3:48 PM
> >Subject: meaning for AgeOfLastAppliedOp in Replication MetricsSink
> >
> >
> >
> >hi,
> >
> >A quick question to clarify this  AgeOfLastAppliedOp in MetricsSink.java.
> I
> >assume it is used as an indicator about how long for a Sink OP to be
> >applied; but instead, it is more like to show how long since the last Sink
> >OP applied
> >
> >  /**
> >   * Set the age of the last applied operation
> >   *
> >   * @param timestamp The timestamp of the last operation applied.
> >   * @return the age that was set
> >   */
> >  public long setAgeOfLastAppliedOp(long timestamp) {
> >    lastTimestampForAge = timestamp;
> >    long age = System.currentTimeMillis() - lastTimestampForAge;
> >    rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
> >    return age;
> >  }
> >
> >In the following scenario:
> >1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> >set for example 100ms;
> >2) and then NO new Sink op occur.
> >3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> >return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, which
> >doesn't make sense, right?
> >
> >should we put a check for (lastTimestampForAge != timestamp) before
> refresh
> >the age?
> >
> >Thanks
> >
> >Demai
>
>
>

Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink

Posted by lars hofhansl <la...@apache.org>.
I was also confusing ageOfLastShippedEdit with ageOfLastAppliedEdit. I had fixed ageOfLastShippedEdit on the source in 0.94.
Looks like 0.94 is doing the right thing with ageOfLastAppliedEdit, but 0.98+ is not.


-- Lars



________________________________
 From: lars hofhansl <la...@apache.org>
To: Demai Ni <ni...@gmail.com>; "dev@hbase.apache.org" <de...@hbase.apache.org> 
Sent: Wednesday, July 30, 2014 10:30 AM
Subject: Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink
 

> When the  'the time an edit entered the system' doesn't change (in the case 
of  no Sink Op enter for a period of time), the age will keep growing 
since current time moving forward, which gives a false impression that 
an edit sitting in the queue for very long time. isn't it?  


I think until the item is shipped it should be counted as waiting. I.e. the time this reports is the time between an edit entered the system and the time it finally gets shipped to the replication sink.refreshAgeOfLastAppliedOp() should only be called when something is actually being shipped, not periodically (I just fixed that in 0.94 HBASE-11143).


But I see you're looking at 0.98. There indeed we are called refreshAgeOfLastAppliedOp every time we call getStats(), which would increase that metric even when there is nothing to ship. That looks like a bug.

-- Lars



________________________________

From: Demai Ni <ni...@gmail.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <la...@apache.org> 
Sent: Wednesday, July 30, 2014 8:49 AM
Subject: Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink



Lars, thanks for your input. 

 
This metric indicates the time an edit sat in the "replication queue" before it got replicated.
yeah, I am with you on this. 
 

With that definition it is doing the right thing: Reporting current time - the time an edit entered the system (it's WAL time)
>
When the  'the time an edit entered the system' doesn't change (in the case of  no Sink Op enter for a period of time), the age will keep growing since current time moving forward, which gives a false impression that an edit sitting in the queue for very long time. isn't it?  




On Tue, Jul 29, 2014 at 10:54 PM, lars hofhansl <la...@apache.org> wrote:

This metric indicates the time an edit sat in the "replication queue" before it got replicated.
>With that definition it is doing the right thing: Reporting current time - the time an edit entered the system (it's WAL time)
>
>
>-- Lars
>
>
>
>________________________________
> From: Demai Ni <ni...@gmail.com>
>To: "dev@hbase.apache.org" <de...@hbase.apache.org>
>Sent: Tuesday, July 29, 2014 3:48 PM
>Subject: meaning for AgeOfLastAppliedOp in Replication MetricsSink
>
>
>
>hi,
>
>A quick question to clarify this  AgeOfLastAppliedOp in MetricsSink.java. I
>assume it is used as an indicator about how long for a Sink OP to be
>applied; but instead, it is more like to show how long since the last Sink
>OP applied
>
>  /**
>   * Set the age of the last applied operation
>   *
>   * @param timestamp The timestamp of the last operation applied.
>   * @return the age that was set
>   */
>  public long setAgeOfLastAppliedOp(long timestamp) {
>    lastTimestampForAge = timestamp;
>    long age = System.currentTimeMillis() - lastTimestampForAge;
>    rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>    return age;
>  }
>
>In the following scenario:
>1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
>set for example 100ms;
>2) and then NO new Sink op occur.
>3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
>return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, which
>doesn't make sense, right?
>
>should we put a check for (lastTimestampForAge != timestamp) before refresh
>the age?
>
>Thanks
>
>Demai

Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink

Posted by lars hofhansl <la...@apache.org>.
> When the  'the time an edit entered the system' doesn't change (in the case 
of  no Sink Op enter for a period of time), the age will keep growing 
since current time moving forward, which gives a false impression that 
an edit sitting in the queue for very long time. isn't it?  


I think until the item is shipped it should be counted as waiting. I.e. the time this reports is the time between an edit entered the system and the time it finally gets shipped to the replication sink.refreshAgeOfLastAppliedOp() should only be called when something is actually being shipped, not periodically (I just fixed that in 0.94 HBASE-11143).


But I see you're looking at 0.98. There indeed we are called refreshAgeOfLastAppliedOp every time we call getStats(), which would increase that metric even when there is nothing to ship. That looks like a bug.

-- Lars



________________________________
 From: Demai Ni <ni...@gmail.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <la...@apache.org> 
Sent: Wednesday, July 30, 2014 8:49 AM
Subject: Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink
 


Lars, thanks for your input. 

 
This metric indicates the time an edit sat in the "replication queue" before it got replicated.
yeah, I am with you on this. 
 

With that definition it is doing the right thing: Reporting current time - the time an edit entered the system (it's WAL time)
>
When the  'the time an edit entered the system' doesn't change (in the case of  no Sink Op enter for a period of time), the age will keep growing since current time moving forward, which gives a false impression that an edit sitting in the queue for very long time. isn't it?  




On Tue, Jul 29, 2014 at 10:54 PM, lars hofhansl <la...@apache.org> wrote:

This metric indicates the time an edit sat in the "replication queue" before it got replicated.
>With that definition it is doing the right thing: Reporting current time - the time an edit entered the system (it's WAL time)
>
>
>-- Lars
>
>
>
>________________________________
> From: Demai Ni <ni...@gmail.com>
>To: "dev@hbase.apache.org" <de...@hbase.apache.org>
>Sent: Tuesday, July 29, 2014 3:48 PM
>Subject: meaning for AgeOfLastAppliedOp in Replication MetricsSink
>
>
>
>hi,
>
>A quick question to clarify this  AgeOfLastAppliedOp in MetricsSink.java. I
>assume it is used as an indicator about how long for a Sink OP to be
>applied; but instead, it is more like to show how long since the last Sink
>OP applied
>
>  /**
>   * Set the age of the last applied operation
>   *
>   * @param timestamp The timestamp of the last operation applied.
>   * @return the age that was set
>   */
>  public long setAgeOfLastAppliedOp(long timestamp) {
>    lastTimestampForAge = timestamp;
>    long age = System.currentTimeMillis() - lastTimestampForAge;
>    rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>    return age;
>  }
>
>In the following scenario:
>1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
>set for example 100ms;
>2) and then NO new Sink op occur.
>3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
>return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, which
>doesn't make sense, right?
>
>should we put a check for (lastTimestampForAge != timestamp) before refresh
>the age?
>
>Thanks
>
>Demai

Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink

Posted by Demai Ni <ni...@gmail.com>.
Lars, thanks for your input.


> This metric indicates the time an edit sat in the "replication queue"
> before it got replicated.

yeah, I am with you on this.


> With that definition it is doing the right thing: Reporting current time -
> the time an edit entered the system (it's WAL time)
>
When the  'the time an edit entered the system' doesn't change (in the case
of  no Sink Op enter for a period of time), the age will keep growing since
current time moving forward, which gives a false impression that an edit
sitting in the queue for very long time. isn't it?


On Tue, Jul 29, 2014 at 10:54 PM, lars hofhansl <la...@apache.org> wrote:

> This metric indicates the time an edit sat in the "replication queue"
> before it got replicated.
> With that definition it is doing the right thing: Reporting current time -
> the time an edit entered the system (it's WAL time)
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Demai Ni <ni...@gmail.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Sent: Tuesday, July 29, 2014 3:48 PM
> Subject: meaning for AgeOfLastAppliedOp in Replication MetricsSink
>
>
> hi,
>
> A quick question to clarify this  AgeOfLastAppliedOp in MetricsSink.java. I
> assume it is used as an indicator about how long for a Sink OP to be
> applied; but instead, it is more like to show how long since the last Sink
> OP applied
>
>   /**
>    * Set the age of the last applied operation
>    *
>    * @param timestamp The timestamp of the last operation applied.
>    * @return the age that was set
>    */
>   public long setAgeOfLastAppliedOp(long timestamp) {
>     lastTimestampForAge = timestamp;
>     long age = System.currentTimeMillis() - lastTimestampForAge;
>     rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
>     return age;
>   }
>
> In the following scenario:
> 1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
> set for example 100ms;
> 2) and then NO new Sink op occur.
> 3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
> return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, which
> doesn't make sense, right?
>
> should we put a check for (lastTimestampForAge != timestamp) before refresh
> the age?
>
> Thanks
>
> Demai
>

Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink

Posted by lars hofhansl <la...@apache.org>.
This metric indicates the time an edit sat in the "replication queue" before it got replicated.
With that definition it is doing the right thing: Reporting current time - the time an edit entered the system (it's WAL time) 


-- Lars



________________________________
 From: Demai Ni <ni...@gmail.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org> 
Sent: Tuesday, July 29, 2014 3:48 PM
Subject: meaning for AgeOfLastAppliedOp in Replication MetricsSink
 

hi,

A quick question to clarify this  AgeOfLastAppliedOp in MetricsSink.java. I
assume it is used as an indicator about how long for a Sink OP to be
applied; but instead, it is more like to show how long since the last Sink
OP applied

  /**
   * Set the age of the last applied operation
   *
   * @param timestamp The timestamp of the last operation applied.
   * @return the age that was set
   */
  public long setAgeOfLastAppliedOp(long timestamp) {
    lastTimestampForAge = timestamp;
    long age = System.currentTimeMillis() - lastTimestampForAge;
    rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age);
    return age;
  }

In the following scenario:
1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is
set for example 100ms;
2) and then NO new Sink op occur.
3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of
return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, which
doesn't make sense, right?

should we put a check for (lastTimestampForAge != timestamp) before refresh
the age?

Thanks

Demai