You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joel Koshy (JIRA)" <ji...@apache.org> on 2012/06/30 00:41:43 UTC

[jira] [Created] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Joel Koshy created KAFKA-385:
--------------------------------

             Summary: RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
                 Key: KAFKA-385
                 URL: https://issues.apache.org/jira/browse/KAFKA-385
             Project: Kafka
          Issue Type: Bug
            Reporter: Joel Koshy


As discussed in KAFKA-353:
1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434655#comment-13434655 ] 

Joel Koshy commented on KAFKA-385:
----------------------------------

I'm about to upload a rebased patch so I'd like to incorporate as much of this as I can.

> 1. Can we make the reporters pluggable? We shouldn't hard code those, you should just give something like 
  metrics.reporters=com.xyz.MyReporter, com.xyz.YourReporter 

These (and ConsoleReporter) are the only reporters currently available. It would be good to make it pluggable, but their constructors are different. i.e., I don't think it is possible to do this without using a spring-like framework.

> 2. Please remove the reference to scala class names from the logging (e.g. DelayedProduce)

I don't follow this comment - can you clarify? Which file are you referring to?

> 3. How are you measuring the performance impact of the change you made to synchronization?

I did not - I can measure the satisfaction time with and without the synchronization but I doubt it would add much overhead. Also, we have to add synchronization one way or the other - either inside the purgatory or outside (i.e,. burden the client's usage).

> 4. It would be good to cut-and-paste the scala timer class they provide in the scala wrapper. That is really nice. 

They == ? Not clear on this - can you clarify?

> 5. Size. 
I think the overhead is the following: 
8 byte pointer to the value 
12 byte object header 
8 byte value 
Total: 28 bytes 

> This is too much memory for something that should just be monitoring. I think we should not do per-key histograms. 

This is certainly a concern. So with the scenario I have in my previous comment, this would be > 13MB per broker and > double that if I use exponentially decaying sampling. That said, there is only one per-key histogram (which is the follower catch up time). OTOH since the main use I can think of is to see which followers are slower we can achieve that with grep's in the log. So I guess the visual benefit comes at a prohibitive memory cost.

                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431475#comment-13431475 ] 

Jun Rao commented on KAFKA-385:
-------------------------------

Thanks for the patch. Overall, it seems that metrics-core is easy to use. Some comments:

1. KafkaMetricsGroup.metricName: add a comment on the following statement and explain what it does
          actualPkg.replaceFirst("""\.""", ".%s.".format(ident))

2. Unused imports: KafkaApis, KafkaConfig

3. Pool.getAndMaybePut(): This seems to force the caller to create a new value object on each call and in most cases, the new object is not needed.

4. FetchRequestKey and ProduceRequestKey: can they be shared?

5. Is it better to put all metrics classes in one package metrics?

6. It's useful to have topic level stats. I am wondering if we can just keep stats at the global and the topic level, but not at topic/partition level. 

7. DelayedProducerRequestMetrics,DelayedFetchRequestMetrics : Can we name the stats consistently? Something like FetchSatisfiedTime and ProduceStatisfiedTime,  FetchSatisfiedRequest and ProduceStatisfiedRequest. Also, there is some overlap with the stats in BrokerTopicStat.

8. ProducerPerformance: Why forcing producer acks to be 2? Shouldn't that come from the command line?

9. The changes and the new files in config/: Are they needed?
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>             Fix For: 0.8
>
>         Attachments: KAFKA-385-v1.patch, example_dashboard.jpg, graphite_explorer.jpg
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-385:
-----------------------------

    Attachment: KAFKA-385-v2.patch


The jira system was down around the time of my last comment and it was not
sent to the mailing list. That further explains some of these updates.

1 - Added produce satisfied stats.
2 - Common request key for fetch/produce.
3 - Factory method for getAndMaybePut.
4 - Added doc describing fudged name for KafkaMetricsGroup.
5 - Fixed the issues with the JMX operations I had earlier.
6 - Reverted the ProducerPerformance and temporary config changes.

I decided against doing the optimization for the FetchRequestPurgatory
checkSatisfied as I don't think it makes a significant difference, as the
available bytes computation is pretty much in memory.

One more comment on the usage of metrics: it is possible to combine
rate/histogram metrics into a metrics Timer object. I chose not to do this
because the code turns out a little cleaner by using a meter/histogram, but
will revisit later.

                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Jay Kreps (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434602#comment-13434602 ] 

Jay Kreps commented on KAFKA-385:
---------------------------------

1. Can we make the reporters pluggable? We shouldn't hard code those, you should just give something like
  metrics.reporters=com.xyz.MyReporter, com.xyz.YourReporter
2. Please remove the reference to scala class names from the logging (e.g. DelayedProduce)
3. How are you measuring the performance impact of the change you made to synchronization?
4. It would be good to cut-and-paste the scala timer class they provide in the scala wrapper. That is really nice.
5. Size.
I think the overhead is the following:
8 byte pointer to the value
12 byte object header
8 byte value
Total: 28 bytes

This is too much memory for something that should just be monitoring. I think we should not do per-key histograms.


                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy reassigned KAFKA-385:
--------------------------------

    Assignee: Joel Koshy
    
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>             Fix For: 0.8
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-385:
-----------------------------

    Labels: bugs  (was: )
    
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-385:
-----------------------------

    Attachment: KAFKA-385-v3-with-lazy-fix.patch

Attached an incremental patch over v3 to illustrate.
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch, KAFKA-385-v3-with-lazy-fix.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-385:
-----------------------------

    Attachment: KAFKA-385-v3.patch
                KAFKA-385-v3.patch

Changes over v2:
- Rebased (twice!)
- For the remaining (global) histograms switched to biased histograms.
- Addressed Jay's comments:
  - 1: actually there's a workaround - basically pass through the properties
    to the custom reporter. (I provided an example
    (KafkaCSVMetricsReporter). If JMX operations need to be exposed the
    custom reporter will need to implement an mbean trait that extends from
    KafkaMetricsReporterMBean. I did this to avoid having to implement the
    DynamicMBean interface. Since we now have pluggable reporters I removed
    the KafkaMetrics class and the dependency on metrics-ganglia and
    metrics-graphite.
  - 2: changed the logging statements in KafkaApis to just say producer
    requests/fetch requests.
  - 3: I did a quick test as described above, but couldn't see any
    measurable impact.
  - 4: Added KafkaTimer and a unit test (which I'm thinking of removing as
    it is pretty dumb other than showing how to use it).
  - 5: Got rid of the per-key follower catch up time histogram from
    DelayedProduceMetrics.  Furthemore, meters are inexpensive and the
    per-key caught up follower request meter should be sufficient.

                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-385:
-----------------------------

    Attachment: KAFKA-385-v1.patch
                graphite_explorer.jpg
                example_dashboard.jpg

Summary of changes and notes:

1 - Fixed the synchronization issue (raised in KAFKA-353) between
  checkSatisfied and expire by synchronizing on the DelayedItem.

2 - Added request purgatory metrics using the metrics-core library. Also
  added support for csv/ganglia/graphite reporters which I think is useful -
  e.g., I attached a graphite dashboard that was pretty easy to whip up. It
  should be a breeze to use metrics-core for other stats in Kafka.

3 - This brings in dependencies on metrics and slf4j, both with Apache
  compatible licenses. I don't know of any specific best-practices in using
  metrics-core as I have not used it before, so it would be great if people
  with experience using it glance over this patch.

4 - It's a bit hard to tell right now which metrics are useful and which are
  pointless/redundant.  We can iron that out over time.

5 - Some metrics are only global and both global and per-key (which I think
  is useful to have, e.g., to get a quick view of which partitions are
  slower).  E.g., it helped to see (in the attached screen shots) that fetch
  requests were all expiring - and it turned out to be a bug in how
  DelayedFetch requests from followers are checked for satisfaction.  The
  issue is that maybeUnblockDelayedFetch is only called if required acks is
  0/1. We need to call it always - in the FetchRequestPurgatory
  checkSatisfied method, if it is a follower request then we need to use
  logendoffset to determine the available bytes to the fetch request, and HW
  if it is a non-follower request. I fixed it to always check
  availableFetchBytes, but it can be made a little more efficient by having
  the DelayedFetch request keep track of currently available bytes in each
  topic-partition key.

6 - I realized that both the watchersForKey and per-key metrics pools keep
  growing.  It may be useful to have a simple garbage collector in the Pool
  class that garbage collects entries that are stale (e.g., due to a
  leader-change), but this is non-critical.

7 - I needed to maintain DelayedRequest metrics outside the purgatory:
  because the purgatory itself is abstract and does not have internal
  knowledge of delayed requests and their keys. Note that these metrics are
  for delayed requests - i.e., these metrics are not updated for those
  requests that are satisfied immediately without going through the
  purgatory.

8 - There is one subtlety with producer throughput: I wanted to keep per-key
  throughput, so the metric is updated on individual key satisfaction. This
  does not mean that the DelayedProduce itself will be satisfied - i.e,.
  what the metric reports is an upper-bound since some DelayedProduce
  requests may have expired.

9 - I think it is better to wait for Kafka-376 to go in first. In this
  patch, I hacked a simpler version of that patch - i.e., in
  availableFetchBytes, I check the logEndOffset instead of the
  high-watermark. Otherwise, follower fetch requests would see zero
  available bytes. Of course, this hack now breaks non-follower fetch
  requests.

10 - KafkaApis is getting pretty big - I can try and move DelayedMetrics out
  if that helps although I prefer having it inside since all the
  DelayedRequests and purgatories are in there.

11 - There may be some temporary edits to start scripts/log4j that I will
  revert in the final patch.

What's left to do:

a - This was a rather painful rebase, so I need to review in case I missed
  something.

b - Optimization described above: DelayedFetch should keep track of
  bytesAvailable for each key and FetchRequestPurgatory's checkSatisfied
  should take a topic, partition and compute availableBytes for just that
  key.

c - The JMX operations to start and stop the reporters are not working
  properly. I think I understand the issue, but will fix later.
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>             Fix For: 0.8
>
>         Attachments: KAFKA-385-v1.patch, example_dashboard.jpg, graphite_explorer.jpg
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-385:
-----------------------------

    Fix Version/s: 0.8
    
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>             Fix For: 0.8
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Jay Kreps (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435458#comment-13435458 ] 

Jay Kreps commented on KAFKA-385:
---------------------------------

+1
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy closed KAFKA-385.
----------------------------

    
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch, KAFKA-385-v3-with-lazy-fix.patch, KAFKA-385-v4.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435689#comment-13435689 ] 

Joel Koshy commented on KAFKA-385:
----------------------------------

Jun had brought up one more problem - the factory method to getAndMaybePut doesn't really
fix the problem of avoiding instantiation of objects if they are already present in the Pool since the
anonymous function that I use needs to instantiate the object. I tweaked the code to use lazy vals
and used logging to verify that each object in the Pool is instantiated only once.

>From what I understand, it seems lazy val's implementation effectively uses a synchronized bitmap
to keep track of whether a particular val has been created or not. However, I'm not so sure how it works
if the val involves a parameter. e.g., lazy val factory = new MyClass(param) as opposed to
lazy val factory = new MyClass The concern is that scala may need to create some internal wrapper
classes (at runtime). I tried disassembling the bytecode but did not want to spend too much time on
it - so I thought I'd ask if anyone know details of how lazy vals work when the actual instance is only
known at runtime?

                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Neha Narkhede (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434332#comment-13434332 ] 

Neha Narkhede commented on KAFKA-385:
-------------------------------------

Patch v2 doesn't apply cleanly on a fresh checkout of 0.8 -

nnarkhed-ld:kafka-385 nnarkhed$ patch -p0 -i ~/Projects/kafka-patches/KAFKA-385-v2.patch 
patching file core/src/main/scala/kafka/Kafka.scala
patching file core/src/main/scala/kafka/api/FetchResponse.scala
patching file core/src/main/scala/kafka/metrics/KafkaMetrics.scala
patching file core/src/main/scala/kafka/metrics/KafkaMetricsConfigShared.scala
patching file core/src/main/scala/kafka/metrics/KafkaMetricsGroup.scala
patching file core/src/main/scala/kafka/server/KafkaApis.scala
Hunk #4 FAILED at 160.
Hunk #12 FAILED at 360.
2 out of 24 hunks FAILED -- saving rejects to file core/src/main/scala/kafka/server/KafkaApis.scala.rej
patching file core/src/main/scala/kafka/server/KafkaConfig.scala
Hunk #1 succeeded at 24 with fuzz 2 (offset 2 lines).
Hunk #2 succeeded at 36 with fuzz 1 (offset 2 lines).
Hunk #3 succeeded at 139 (offset 2 lines).
patching file core/src/main/scala/kafka/server/RequestPurgatory.scala
patching file core/src/main/scala/kafka/utils/Pool.scala
patching file core/src/main/scala/kafka/utils/Utils.scala
Hunk #1 succeeded at 502 (offset 1 line).
patching file core/src/test/scala/unit/kafka/integration/LogCorruptionTest.scala
patching file core/src/test/scala/unit/kafka/integration/TopicMetadataTest.scala
patching file core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala
patching file project/build/KafkaProject.scala

                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434342#comment-13434342 ] 

Joel Koshy commented on KAFKA-385:
----------------------------------

Yes - looks like KAFKA-369 went in yesterday. I will need to rebase now.
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy resolved KAFKA-385.
------------------------------

    Resolution: Fixed

Thanks for the reviews. Committed to 0.8.
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch, KAFKA-385-v3-with-lazy-fix.patch, KAFKA-385-v4.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435753#comment-13435753 ] 

Jun Rao commented on KAFKA-385:
-------------------------------

For the issue with getAndMaybePut(), can we add an optional createValue() method to the constructor of Pool? If an object doesn't exist for a key, Pool can call createValue() to create a new value object from key.

A few other comments:
30. KafkaConfig: brokerid should be a required property. So we shouldn't put a default value.

31. satisfiedRequestMeter: Currently, it doesn't include the time for expired requests. I prefer to have a stat that gives the time for all requests, whether expired or not. 

32. I am not sure how useful the metric CaughtUpFollowerFetchRequestsPerSecond is.
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch, KAFKA-385-v3-with-lazy-fix.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435461#comment-13435461 ] 

Joel Koshy commented on KAFKA-385:
----------------------------------

I forgot to update the running flag in the example reporter - will fix that before check-in.
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434271#comment-13434271 ] 

Joel Koshy commented on KAFKA-385:
----------------------------------

Just want to add a comment on metrics overheads of using histograms, meters
and gauges.

Meters use an exponential weighted moving average and don't need to maintain
past data.  Histograms are implemented using reservoir sampling and need to
maintain a sampling array. There is only one per-key histogram - the
"follower catch-up time" histogram in DelayedProducerRequestMetrics. There
are four more global histograms.

The sampling array is 1028 atomic longs by default. So if we have say, 500
topics with four partitions each, four brokers, assume that leadership is
evenly spread out, a *lower* bound (if we ignore object overheads and the
global histograms) on memory would be ~ 4MB per broker.

Also, after looking at the metrics code a little more, I think we should use
the exponentially decaying sampling option. By default, the histogram uses a
uniform sample - i.e., it effectively takes a uniform sample from all the
seen data points. Exponential decaying sampling gives more weight to the
past five minutes of data - but it would use at least double the memory of
uniform sampling which pushes memory usage to over 8MB.

                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-385:
-----------------------------

    Attachment: KAFKA-385-v4.patch

Moved the valueFactory to Pool's constructor.
Unit tests/system tests pass.
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch, KAFKA-385-v3-with-lazy-fix.patch, KAFKA-385-v4.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435782#comment-13435782 ] 

Joel Koshy commented on KAFKA-385:
----------------------------------

That's a good suggestion, and should work as well. I'll make that change tomorrow. It would be good to understand the lazy val implementation as well - will try and dig into some toy examples.

30. Correct - I had reverted that in the last attachment. There was a (temporary I think) reason I needed that default but I don't remember.

For 31 and 32 I think we should defer this to a later discussion. I actually had expiration time before but removed it. I'm not sure it makes a lot of sense. It would perhaps be useful to detect producers that are setting a very low expiration period, but even so it is driven by producer configs and would be a mash-up of values from different producers with expiring requests. Stats can be asymmetric between delayed-produce/delayed-fetch and also between expired/satisfied.

                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch, KAFKA-385-v3-with-lazy-fix.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-385) RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436188#comment-13436188 ] 

Jun Rao commented on KAFKA-385:
-------------------------------

+1 on patch v4.
                
> RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-385
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: example_dashboard.jpg, graphite_explorer.jpg, KAFKA-385-v1.patch, KAFKA-385-v2.patch, KAFKA-385-v3.patch, KAFKA-385-v3.patch, KAFKA-385-v3-with-lazy-fix.patch, KAFKA-385-v4.patch
>
>
> As discussed in KAFKA-353:
> 1 - There is potential for a client-side race condition in the implementations of expire and checkSatisfied. We can just synchronize on the DelayedItem.
> 2 - Would be good to add jmx beans to facilitate monitoring RequestPurgatory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira