You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jothi Padmanabhan (JIRA)" <ji...@apache.org> on 2009/09/15 07:09:57 UTC

[jira] Created: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Modify JobHistory to use Avro for serialization instead of raw JSON
-------------------------------------------------------------------

                 Key: MAPREDUCE-980
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
            Reporter: Jothi Padmanabhan


MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Status: Patch Available  (was: Open)

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Status: Patch Available  (was: Open)

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757584#action_12757584 ] 

Hudson commented on MAPREDUCE-980:
----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #58 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/58/])
    .  Modify JobHistory to use Avro for serialization.


> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

Fixed a possible null pointer exception.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

> Don't we need Javadocs for the newly introduced public constructors in Counter and CounterGroup?

Added.


> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Status: Open  (was: Patch Available)

Cancelling to resolve conflicts created by HADOOP-277.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757333#action_12757333 ] 

Jothi Padmanabhan commented on MAPREDUCE-980:
---------------------------------------------

One a quick glance, minor nit -- Don't we need Javadocs for the newly introduced public constructors in Counter and CounterGroup?

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

Here's a version that passes tests.

I will mark this as "Patch Available" as soon as MAPREDUCE-157 is committed so that Hudson can have a look at it.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757390#action_12757390 ] 

Jothi Padmanabhan commented on MAPREDUCE-980:
---------------------------------------------

Got it. Thanks.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757389#action_12757389 ] 

Doug Cutting commented on MAPREDUCE-980:
----------------------------------------

> should the EventWriter.Version be something different than "Avro-Binary"? Something that will help us keep track of schema evolutions

The entire schema is included in the file.  If the schema changes, Avro can still read old data.  We don't need to update the file version if we, e.g., add a field.  If we make such a fundamental change to the schema that Avro's automatic versioning cannot handle it, then we could change the version string to be "Avro-Binary-v2" or something.  Or we could examine the schema itself to determine which version it is.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting reassigned MAPREDUCE-980:
--------------------------------------

    Assignee: Doug Cutting

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757296#action_12757296 ] 

Doug Cutting commented on MAPREDUCE-980:
----------------------------------------

> MAPREDUCE-277 will conflict with this patch

I'm happy to do the merge regardless of which is committed first.

Since MAPREDUCE-277 is a blocker, my preference would be to commit this issue first, then that one, since it is not subject to today's deadline.

I still need a committer to +1 this.


> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

Proper 'svn diff' version of patch now that MAPREDUCE-157 is committed.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Fix Version/s: 0.21.0
           Status: Patch Available  (was: Open)

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757329#action_12757329 ] 

Doug Cutting commented on MAPREDUCE-980:
----------------------------------------

'ant test-patch' on current patch:

{noformat}
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no new tests are needed for this patch.
     [exec]                         Also please list what manual steps were performed to verify this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
{noformat}

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757302#action_12757302 ] 

Jothi Padmanabhan commented on MAPREDUCE-980:
---------------------------------------------

bq. my preference would be to commit this issue first, then that one, since it is not subject to today's deadline.

Sorry, just saw this. In the meanwhile, MAPREDUCE-277  got committed. 

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Status: Open  (was: Patch Available)

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756752#action_12756752 ] 

Philip Zeyliger commented on MAPREDUCE-980:
-------------------------------------------

My experience with generated objects (from a couple of years using protocol buffers) is that one ends up wrapping them often (preferably with composition).  

The generated class is responsible for serialization and deserialization, and the wrapper class is responsible for added logic.  It's hard to make the generator do something reasonable for logic (or even inheritance) cross-language.  Having a wrapper also allows you to have two ways to use something, in two different contexts, where you might want different surrounding logic.  (So, if you had an Avro schema for an Event, the code that generates the Event might use one wrapper, and the code that consumes it might use the raw object, or have a different object.)

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757356#action_12757356 ] 

Doug Cutting commented on MAPREDUCE-980:
----------------------------------------

> should we change the encoder to json [ ... ]

That was my initial instinct too, but Eric and Owen both indicated to me that they preferred that we use binary.

Eric's comment is:

https://issues.apache.org/jira/browse/MAPREDUCE-157?focusedCommentId=12745279&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12745279

Owen has indicated this in offline discussions.  The idea is that one can easily use Avro to dump the binary as JSON, but that the binary is smaller and faster.

It's a trivial change to make if we prefer JSON instead of binary.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

Improved javadoc a bit.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

Merged with MAPREDUCE-277.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756702#action_12756702 ] 

Doug Cutting commented on MAPREDUCE-980:
----------------------------------------

> Any reason why we can't directly use generated classes ?

You've already cited the biggest reason: the generated classes don't provide constructors or accessors.  Long-term, we could enhance Avro to generate these, but I'm not sure we'd want to directly use the generated classes even then.

The wrappers provide considerable utility, including:
 - Javadoc comments.  We could generate these perhaps from documentation in the schema.
 - Visibility: The wrappers only provide public getters, not setters.  We could perhaps add that to the schema and/or generator.
 - Type conversion:  In both the version included in MAPREDUCE-157 and this version there's a fair amount of field-specific type conversion.  For example, we don't directly serialize JobID instances, but rather use JobID's toString() and forName() methods to convert these to and from strings for serialization.  Similarly for counters, task ids, etc.  Ideally all of these would be naturally serializeable using Avro, but, until they are, the wrappers make it easy to incorporate things like these.
 - Compatibility: If we update the schema then Avro will handle reading old data, but, without the wrappers, we'd be unable to provide a back-compatible API for accessing the old data.  So if we remove a field from the schema, with the wrappers we're able to deprecate the accessor and implement it in terms of new/remaining fields so that applications don't have to be upgraded.

So I'm not entirely convinced that using wrappers for stuff like this is a bad pattern long term.


> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757261#action_12757261 ] 

Jothi Padmanabhan commented on MAPREDUCE-980:
---------------------------------------------

bq. I added two more methods to the JobSubmittedEvent

Sorry,  I meant two more arguments to the JobSubmittedEvent constructor.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

Fixed all ivy.xml files to now refer to Avro 1.0 rather than 1.1.  Avro, Jackson and Paranamer versions are now specified in library.properties, so that this should not occur again.

'ant test-patch' reports:

{noformat}
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no new tests are needed for this patch.
     [exec]                         Also please list what manual steps were performed to verify this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
{noformat}

No new tests are required, as MAPREDUCE-157 supplied sufficient tests.


> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757559#action_12757559 ] 

Hadoop QA commented on MAPREDUCE-980:
-------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420063/MAPREDUCE-980.patch
  against trunk revision 816782.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/116/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/116/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/116/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/116/console

This message is automatically generated.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757307#action_12757307 ] 

Hadoop QA commented on MAPREDUCE-980:
-------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420034/MAPREDUCE-980.patch
  against trunk revision 816664.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/53/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/53/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/53/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/53/console

This message is automatically generated.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755346#action_12755346 ] 

Jothi Padmanabhan commented on MAPREDUCE-980:
---------------------------------------------

As per [this comment | https://issues.apache.org/jira/browse/MAPREDUCE-157?focusedCommentId=12745279&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12745279], I created this new Jira to follow the port to use Avro

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757383#action_12757383 ] 

Jothi Padmanabhan commented on MAPREDUCE-980:
---------------------------------------------

I do not know if I am being naive here, but should the EventWriter.Version be something different than "Avro-Binary"? Something that will help us keep track of schema evolutions, like "1.0" ? Or is the version used for a different purpose? 

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756382#action_12756382 ] 

Sharad Agarwal commented on MAPREDUCE-980:
------------------------------------------

It would have been ideal if we could directly use Avro generated classes without wrapping them. Wrapper classes are not very maintainable because we need to modify at two places- schema definition and wrapper class. Any reason why we can't directly use generated classes ? From what I can think of  - this is done because we want all event classes to have a base interface, constructor and field getters. Having constructor and getters should be straight forward in code generator. For base class/interface, I think Avro can generate code from a template. SpecificRecordBase methods can directly go into the generated class (to work around multiple inheritance in java). Users can define the base class, interfaces or additional methods in the template which can be used to generate Avro specific class. I understand that this may not be doable at this point but something worth considering at some point to make Avro code generation feature more compelling.



> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Status: Patch Available  (was: Open)

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Status: Patch Available  (was: Open)

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

Restore some dropped javadoc and muffed visibility.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757259#action_12757259 ] 

Jothi Padmanabhan commented on MAPREDUCE-980:
---------------------------------------------

MAPREDUCE-277 will conflict with this patch -- I added two more methods to the JobSubmittedEvent in that patch. Depending on which goes first, the other will have to merge.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757466#action_12757466 ] 

Owen O'Malley commented on MAPREDUCE-980:
-----------------------------------------

This looks like a good change. I love it when we get to rip out code.

+1

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Status: Open  (was: Patch Available)

Found a bunch more Ivy references to Avro 1.0 that need to be updated to 1.1.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Attachment: MAPREDUCE-980.patch

Here's a first pass at this.  It must be applied after the MAPREDUCE-157 patch.  It compiles but has not yet been tested.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>         Attachments: MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757456#action_12757456 ] 

Hadoop QA commented on MAPREDUCE-980:
-------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420061/MAPREDUCE-980.patch
  against trunk revision 816735.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/54/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/54/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/54/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/54/console

This message is automatically generated.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757350#action_12757350 ] 

Jothi Padmanabhan commented on MAPREDUCE-980:
---------------------------------------------

Just another clarification -- since the patch is using avro1.1, should we change the encoder to json instead of binary so that tools that scrape the logs instead of using EventReaders be supported? 

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Status: Open  (was: Patch Available)

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated MAPREDUCE-980:
-----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757184#action_12757184 ] 

Hadoop QA commented on MAPREDUCE-980:
-------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420023/MAPREDUCE-980.patch
  against trunk revision 816647.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    -1 findbugs.  The patch appears to cause Findbugs to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/52/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/52/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/52/console

This message is automatically generated.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756993#action_12756993 ] 

Hadoop QA commented on MAPREDUCE-980:
-------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419913/MAPREDUCE-980.patch
  against trunk revision 816454.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/101/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/101/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/101/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/101/console

This message is automatically generated.

> Modify JobHistory to use Avro for serialization instead of raw JSON
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jothi Padmanabhan
>            Assignee: Doug Cutting
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch, MAPREDUCE-980.patch
>
>
> MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.