You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/09/07 23:40:58 UTC

[jira] Created: (HADOOP-6244) Improvements to FileContext metrics output formatting

Improvements to FileContext metrics output formatting
-----------------------------------------------------

                 Key: HADOOP-6244
                 URL: https://issues.apache.org/jira/browse/HADOOP-6244
             Project: Hadoop Common
          Issue Type: Improvement
          Components: metrics
    Affects Versions: 0.20.0
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
             Fix For: 0.21.0


The output of FileContext has two big issues: 1) it doesn't include a timestamp, 2) it doesn't differentiate between tags and metrics in formatting. This patch is to improve the output format to be more useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6244) Improvements to FileContext metrics output formatting

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-6244:
----------------------------------

    Status: Open  (was: Patch Available)

While timestamps and distinguishing tags from metrics are both useful properties, changing the format of FileContext disrupts all the downstream consumers. Other metrics consumers (e.g. Chukwa) have defined different MetricsContexts to effect formatting changes, rather than modifying this class. That said, either (a) making the format configurable or (b) making FileContext more readily extensible would be welcome changes, since most alternative implementations end up copying FileContext. IIRC, there's a Log4j-based MetricsContext in Chukwa that may be a worthy, if heavyweight option for a configurable, file-based context.

Other, small notes:
* This seems unnecessary, as the lock is held while accessing {{updaters}}:
{noformat}
-  private void timerEvent() throws IOException {
+  private synchronized void timerEvent() throws IOException {
     if (isMonitoring) {
       Collection<Updater> myUpdaters;
       synchronized (this) {
{noformat}
Either the latter synchronized block needs to be removed (with the comment justifying it) or this should be reverted.
* The protected fields are part of FileContext's API and probably should not be made package-private.
* {{emitNowForTests}} is a regrettable method, but then again, so is use of {{Timer}}, and the metrics have been without any unit tests for too long. Still, this makes timerEvent a public API, despite the javadoc. Adding a call to {{timerEvent}} after {{stopTimer}} in {{stopMonitoring}} is tempting, but the caller of the latter method should probably not block if the metrics update does. Since it's only required for FileContext testing at the moment, would it be reasonable to let {{emitNowForTests}} be a package-private on FileContext?
* The unit test should use JUnit4 rather than JUnit3 semantics
* Adding a static class to read records emitted by FileContext would be great, but I think the notion highlights how the metrics package would be better served by adding a FileContext using a standard format, like JSON, or base it on Avro.

> Improvements to FileContext metrics output formatting
> -----------------------------------------------------
>
>                 Key: HADOOP-6244
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6244
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: hadoop-6244.txt
>
>
> The output of FileContext has two big issues: 1) it doesn't include a timestamp, 2) it doesn't differentiate between tags and metrics in formatting. This patch is to improve the output format to be more useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6244) Improvements to FileContext metrics output formatting

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-6244:
--------------------------------

        Fix Version/s:     (was: 0.21.0)
                       0.22.0
    Affects Version/s:     (was: 0.20.0)
                       0.22.0
               Status: Patch Available  (was: Open)

> Improvements to FileContext metrics output formatting
> -----------------------------------------------------
>
>                 Key: HADOOP-6244
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6244
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: hadoop-6244.txt
>
>
> The output of FileContext has two big issues: 1) it doesn't include a timestamp, 2) it doesn't differentiate between tags and metrics in formatting. This patch is to improve the output format to be more useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6244) Improvements to FileContext metrics output formatting

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773727#action_12773727 ] 

Chris Douglas commented on HADOOP-6244:
---------------------------------------

bq. this is why I left the old format in as a configuration option. Is anyone actually using the old format, though? [...] I think now (ie before 1.0) is the time when we should feel free to change formats/APIs that are clearly bad as long as we provide a deprecated compatibility path, yes?

FileContext has been around for a long time, and while I agree that pre-1.0 is a great time for format and API changes, we're not indifferent to compatibility. The metrics framework has plenty of known awkwardnesses; if it's going to change incompatibly, addressing more than missing timestamps would make sense.

bq. JSON would be reasonable, but I think it's important we continue to have a simpler text metrics logging option. When fishing around on nodes it's good to be able to use perl, grep, and awk without having to install a JSON parser. Avro makes sense for those who want to do long term analysis, but I think that should be a separate patch.

I agree that writing metrics in a templated/standard format should be in a separate issue. I mentioned it to qualify and mostly retract the request for a reader.

> Improvements to FileContext metrics output formatting
> -----------------------------------------------------
>
>                 Key: HADOOP-6244
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6244
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: hadoop-6244.txt
>
>
> The output of FileContext has two big issues: 1) it doesn't include a timestamp, 2) it doesn't differentiate between tags and metrics in formatting. This patch is to improve the output format to be more useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6244) Improvements to FileContext metrics output formatting

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865810#action_12865810 ] 

Steve Loughran commented on HADOOP-6244:
----------------------------------------

I'm +1 to timestamps, worried about breaking things
# HBASE-1021 added timestamps to HBase metrics; without breaking the existing code. But subclasses like that can break too.
# I'd prefer to have a new class/subclass to do the timestamps. This could be added to the classpath to 0.20.x clusters to get timestamping without needing a new hadoop JAR and not break HBase or other apps. Things like JSON will be handy but again, separate. We can evolve this at a faster rate than the main h-common release process


> Improvements to FileContext metrics output formatting
> -----------------------------------------------------
>
>                 Key: HADOOP-6244
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6244
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: hadoop-6244.txt
>
>
> The output of FileContext has two big issues: 1) it doesn't include a timestamp, 2) it doesn't differentiate between tags and metrics in formatting. This patch is to improve the output format to be more useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6244) Improvements to FileContext metrics output formatting

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768093#action_12768093 ] 

Hadoop QA commented on HADOOP-6244:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418852/hadoop-6244.txt
  against trunk revision 827860.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/96/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/96/console

This message is automatically generated.

> Improvements to FileContext metrics output formatting
> -----------------------------------------------------
>
>                 Key: HADOOP-6244
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6244
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: hadoop-6244.txt
>
>
> The output of FileContext has two big issues: 1) it doesn't include a timestamp, 2) it doesn't differentiate between tags and metrics in formatting. This patch is to improve the output format to be more useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6244) Improvements to FileContext metrics output formatting

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865812#action_12865812 ] 

Todd Lipcon commented on HADOOP-6244:
-------------------------------------

Steve, if you want to work on this, feel free to reassign to yourself - this is lower on my priority queue. Thanks!

> Improvements to FileContext metrics output formatting
> -----------------------------------------------------
>
>                 Key: HADOOP-6244
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6244
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: hadoop-6244.txt
>
>
> The output of FileContext has two big issues: 1) it doesn't include a timestamp, 2) it doesn't differentiate between tags and metrics in formatting. This patch is to improve the output format to be more useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6244) Improvements to FileContext metrics output formatting

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-6244:
--------------------------------

    Attachment: hadoop-6244.txt

This patch improves the FileContext output format from:

{code}
test1.testRecord: testTag1=testTagValue1, testTag2=testTagValue2, testMetric1=1, testMetric2=33
{code}
to:
{code}
[1252359700030] test1.testRecord(testTag1=testTagValue1, testTag2=testTagValue2) {testMetric1=1, testMetric2=33}
{code}

Just in case anyone depends on the old format, I've added a configuration property for FileContext called "newStyle" that can be set to "false" to revert to the old format. If that seems unnecessary, I'll remove it from the patch.

> Improvements to FileContext metrics output formatting
> -----------------------------------------------------
>
>                 Key: HADOOP-6244
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6244
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.20.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.21.0
>
>         Attachments: hadoop-6244.txt
>
>
> The output of FileContext has two big issues: 1) it doesn't include a timestamp, 2) it doesn't differentiate between tags and metrics in formatting. This patch is to improve the output format to be more useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6244) Improvements to FileContext metrics output formatting

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772867#action_12772867 ] 

Todd Lipcon commented on HADOOP-6244:
-------------------------------------

bq. While timestamps and distinguishing tags from metrics are both useful properties, changing the format of FileContext disrupts all the downstream consumers.

Agreed - this is why I left the old format in as a configuration option. Is anyone actually *using* the old format, though? When I wanted to look at metrics logs from FileContext in the past, I ended up actually writing a perl script to interpolate timestamps based on the file's mtime - pretty awful. I think now (ie before 1.0) is the time when we should feel free to change formats/APIs that are clearly bad as long as we provide a deprecated compatibility path, yes?


As for the code notes, I think all of your points are valid - I'll upload a new patch soon.

bq. better served by adding a FileContext using a standard format, like JSON, or base it on Avro.

JSON would be reasonable, but I think it's important we continue to have a simpler text metrics logging option. When fishing around on nodes it's good to be able to use perl, grep, and awk without having to install a JSON parser. Avro makes sense for those who want to do long term analysis, but I think that should be a separate patch.

> Improvements to FileContext metrics output formatting
> -----------------------------------------------------
>
>                 Key: HADOOP-6244
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6244
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: hadoop-6244.txt
>
>
> The output of FileContext has two big issues: 1) it doesn't include a timestamp, 2) it doesn't differentiate between tags and metrics in formatting. This patch is to improve the output format to be more useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.