You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2009/12/18 19:22:18 UTC

[jira] Created: (HBASE-2055) Serialize WAL as Avro records

Serialize WAL as Avro records
-----------------------------

                 Key: HBASE-2055
                 URL: https://issues.apache.org/jira/browse/HBASE-2055
             Project: Hadoop HBase
          Issue Type: Improvement
            Reporter: Andrew Purtell
            Assignee: Andrew Purtell
            Priority: Minor


There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 

I think we have this criteria for its use:
1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
2) Space consumed by Avro serialization is no worse than that of Writables
3) File format is amenable to appends (cannot require valid trailers, etc.)

I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2055) Serialize WAL as Avro records

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792979#action_12792979 ] 

Lars George commented on HBASE-2055:
------------------------------------

Great stuff Andy, this block marker makes totally sense also in the context of log splitting, which needs blocks of the log being presorted by the RS's before they get applied. With the marker this is a natural fit. BigTable does 64k chunks too (during sorts).

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055-v3.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2055) Serialize WAL as Avro records

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794573#action_12794573 ] 

Jeff Hammerbacher commented on HBASE-2055:
------------------------------------------

Hey Andy,

I don't think putting a snapshot of trunk into your svn would be a great idea. The 1.3 release will have a finalized design and implementation of the file object container.

As for the configurable SYNC_INTERVAL: it makes sense to me to make this a configuration parameter. That said, it would be worth raising your concerns on the AVRO JIRA rather than in this issue.

Thanks,
Jeff

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055-v3.patch, HBASE-2055-v4.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-2055:
----------------------------------

    Attachment: HBASE-2055-v4.patch

v4 should make it easier to extend the reader and writer. THBase will need to do this in order to function. 

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055-v3.patch, HBASE-2055-v4.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2055) Serialize WAL as Avro records

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792760#action_12792760 ] 

Jeff Hammerbacher commented on HBASE-2055:
------------------------------------------

Additional criteria: Avro has clients for reading and writing data files in several languages which will facilitate writing debugging and profiling utilities.

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794209#action_12794209 ] 

Andrew Purtell commented on HBASE-2055:
---------------------------------------

@Jeff: When that patch is committed we can look at putting a snapshot of Avro trunk on ours. Also I see that SYNC_SIZE is a constant. Should be configurable? We want 64k, others might want different?



> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055-v3.patch, HBASE-2055-v4.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-2055:
----------------------------------

    Attachment: HBASE-2055-v3.patch

v3 writes a sync marker every 64K which includes a copy of the schema. 

The file is initialized with a sync marker. The reader scans from the start of the file until it finds a valid sync marker and then reads in the schema. 

This is a fair amount of overhead -- 1 byte per record, 1K per 64K of data -- but does mean edits from corrupt logs can be partially recovered.

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055-v3.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-2055:
----------------------------------

    Attachment: HBASE-2055-v2.patch

v2 patch passes all tests. Also, in this version we write the schema as a file header and use it to initialize the reader. 

In case anyone is curious, we are not using Avro's bundled file I/O package because the file format puts schema and metadata into a trailer so seems not suitable as a log which may be truncated as part of "normal" operation. 

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-2055:
----------------------------------

    Attachment:     (was: HBASE-2055.patch)

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-2055:
----------------------------------

    Attachment: HBASE-2055.patch

No idea if it works yet.

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2055) Serialize WAL as Avro records

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794181#action_12794181 ] 

Jeff Hammerbacher commented on HBASE-2055:
------------------------------------------

Hey Andy,

Grab the latest Java patch from https://issues.apache.org/jira/browse/AVRO-160--the new file format puts metadata in the header, rather than the footer. For future maintenance, it may be easier to stick with the default Avro file object container.

Later,
Jeff

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055-v3.patch, HBASE-2055-v4.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792963#action_12792963 ] 

Andrew Purtell commented on HBASE-2055:
---------------------------------------

I put up a patch which implements a HLog reader and writer that uses Avro for serialization. Some basic function works but TestHLog fails all three cases with EOFExceptions, always when reading the last field of a particular (truncated?) record. Surprisingly, TestFullLogReconstruction succeeds. 

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-2055:
----------------------------------

    Attachment: test-site.patch
                HBASE-2055.patch

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794366#action_12794366 ] 

Andrew Purtell commented on HBASE-2055:
---------------------------------------

Sorry, above I meant SYNC_INTERVAL, not SYNC_SIZE. Also it looks like the DataFileWriter as implemented for AVRO-160 will hold up to SYNC_INTERVAL bytes in a buffer before writing out the block. We want to hsync after a group of related commits in the WAL whether SYNC_INTERVAL is reached or not, but also have the stream marked with a sync marker at each SYNC_INTERVAL. This is basically what my v3 or v4 patch does. It also writes a copy of the schema just after the sync marker so we have an opportunity to resynchronize a reader on each block regardless of how many previous blocks are corrupt (perhaps all). 

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055-v3.patch, HBASE-2055-v4.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-2055:
----------------------------------

    Attachment: paranamer-1.5.jar
                jackson-mapper-asl-1.0.1.jar
                jackson-core-asl-1.0.1.jar

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2055) Serialize WAL as Avro records

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792974#action_12792974 ] 

ryan rawson commented on HBASE-2055:
------------------------------------

i am wondering what would happen if the header was mangled? 

Does it make sense to put the schema in multiple places? like super blocks in ext3?

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reassigned HBASE-2055:
-------------------------------------

    Assignee:     (was: Andrew Purtell)

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2055) Serialize WAL as Avro records

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-2055:
----------------------------------

    Attachment: TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz
                TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz
                TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz

> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2055) Serialize WAL as Avro records

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795440#action_12795440 ] 

Doug Cutting commented on HBASE-2055:
-------------------------------------

I agree that SYNC_INTERVAL should be configurable.

Note that the current plan is to support appends but no longer to support changing the schema in a file.  The schema is included only once, at the start of the file.  If you have further comments, please add them to AVRO-160.


> Serialize WAL as Avro records
> -----------------------------
>
>                 Key: HBASE-2055
>                 URL: https://issues.apache.org/jira/browse/HBASE-2055
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2055-v2.patch, HBASE-2055-v3.patch, HBASE-2055-v4.patch, HBASE-2055.patch, jackson-core-asl-1.0.1.jar, jackson-mapper-asl-1.0.1.jar, paranamer-1.5.jar, TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz, TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz, TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz, test-site.patch
>
>
> There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement. 
> I think we have this criteria for its use:
> 1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
> 2) Space consumed by Avro serialization is no worse than that of Writables
> 3) File format is amenable to appends (cannot require valid trailers, etc.)
> I'll put up a patch so we can try it out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.