You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Scott Carey (JIRA)" <ji...@apache.org> on 2010/02/12 19:09:28 UTC

[jira] Created: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Binary File concatenate and block-based apppend / compress / uncompress
-----------------------------------------------------------------------

                 Key: AVRO-414
                 URL: https://issues.apache.org/jira/browse/AVRO-414
             Project: Avro
          Issue Type: New Feature
            Reporter: Scott Carey
            Assignee: Scott Carey


The block based format of the binary file allows for block-based operations that do not decode or encode data.

Two such use cases are:

* Change the compression codec or compression level of a file without decoding.
* Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-414:
-----------------------------

    Attachment: AVRO-414_plus_392.patch

When AVRO-392 is committed, I'll recreate a patch with only AVRO-414 in it.  

This patch contains both, which should allow for review in advance of AVRO-392 if anyone wishes to do so.

> Binary File concatenate and block-based apppend / compress / uncompress
> -----------------------------------------------------------------------
>
>                 Key: AVRO-414
>                 URL: https://issues.apache.org/jira/browse/AVRO-414
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-414.patch, AVRO-414_plus_392.patch
>
>
> The block based format of the binary file allows for block-based operations that do not decode or encode data.
> Two such use cases are:
> * Change the compression codec or compression level of a file without decoding.
> * Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-414:
-----------------------------

    Attachment: AVRO-414.patch

This patch depends on AVRO-392 and will apply after either of the last two patches there (with or without DirectBinaryDecoder).

This includes an "appendAllFrom(DataFileStream)" method on DataFileWriter.   This uses a new package protected block-based file iterator in DataFileStream, which has been changed to use this for regular iteration as well.  The Codec API has been simplified as a result -- inputs and outputs to encoding are ByteBuffers.

A new test class tests concatenation with various codec combinations. 

> Binary File concatenate and block-based apppend / compress / uncompress
> -----------------------------------------------------------------------
>
>                 Key: AVRO-414
>                 URL: https://issues.apache.org/jira/browse/AVRO-414
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>         Attachments: AVRO-414.patch
>
>
> The block based format of the binary file allows for block-based operations that do not decode or encode data.
> Two such use cases are:
> * Change the compression codec or compression level of a file without decoding.
> * Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-414:
------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Scott!

> Binary File concatenate and block-based apppend / compress / uncompress
> -----------------------------------------------------------------------
>
>                 Key: AVRO-414
>                 URL: https://issues.apache.org/jira/browse/AVRO-414
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-414.patch, AVRO-414.patch, AVRO-414_plus_392.patch
>
>
> The block based format of the binary file allows for block-based operations that do not decode or encode data.
> Two such use cases are:
> * Change the compression codec or compression level of a file without decoding.
> * Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835125#action_12835125 ] 

Scott Carey commented on AVRO-414:
----------------------------------

The last patch includes the latest AVRO-392.patch.  The previous one no longer applies cleanly.

> Binary File concatenate and block-based apppend / compress / uncompress
> -----------------------------------------------------------------------
>
>                 Key: AVRO-414
>                 URL: https://issues.apache.org/jira/browse/AVRO-414
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-414.patch, AVRO-414_plus_392.patch
>
>
> The block based format of the binary file allows for block-based operations that do not decode or encode data.
> Two such use cases are:
> * Change the compression codec or compression level of a file without decoding.
> * Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836003#action_12836003 ] 

Scott Carey commented on AVRO-414:
----------------------------------

bq. Scott, how critical is this for 1.3? This patch no longer applies cleanly.

I'll be using this in production either way -- 1.3 patched with this or 1.3 official.  
If its not in 1.3 this is not critical but there is appeal to running without a patch.  If producing an RC today is important, and this can't make it, I'm ok with that.
Unlike AVRO-392 which had wide scope and changed semantics, this change is limited in scope.  It just improves the file format code a bit, increases testing coverage, and adds some extra functionality.

It may not be that quick to review, since it changed the file reader to have an internal iterator over blocks so that classes in the same package can choose to iterate either by block, or by record.  That required some non-trivial changes.  The new TestDataFileConcat class adds additional coverage to the file format, and during the process of adding these I discovered a couple corner case bugs and fixed them (like AVRO-407).


> Binary File concatenate and block-based apppend / compress / uncompress
> -----------------------------------------------------------------------
>
>                 Key: AVRO-414
>                 URL: https://issues.apache.org/jira/browse/AVRO-414
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-414.patch, AVRO-414.patch, AVRO-414_plus_392.patch
>
>
> The block based format of the binary file allows for block-based operations that do not decode or encode data.
> Two such use cases are:
> * Change the compression codec or compression level of a file without decoding.
> * Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835988#action_12835988 ] 

Doug Cutting commented on AVRO-414:
-----------------------------------

Scott, how critical is this for 1.3?  This patch no longer applies cleanly.

> Binary File concatenate and block-based apppend / compress / uncompress
> -----------------------------------------------------------------------
>
>                 Key: AVRO-414
>                 URL: https://issues.apache.org/jira/browse/AVRO-414
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-414.patch, AVRO-414_plus_392.patch
>
>
> The block based format of the binary file allows for block-based operations that do not decode or encode data.
> Two such use cases are:
> * Change the compression codec or compression level of a file without decoding.
> * Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-414:
-----------------------------

    Component/s: java

> Binary File concatenate and block-based apppend / compress / uncompress
> -----------------------------------------------------------------------
>
>                 Key: AVRO-414
>                 URL: https://issues.apache.org/jira/browse/AVRO-414
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>
> The block based format of the binary file allows for block-based operations that do not decode or encode data.
> Two such use cases are:
> * Change the compression codec or compression level of a file without decoding.
> * Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-414:
-----------------------------

    Fix Version/s: 1.3.0
           Status: Patch Available  (was: Open)

It would be great to get this in 1.3.

> Binary File concatenate and block-based apppend / compress / uncompress
> -----------------------------------------------------------------------
>
>                 Key: AVRO-414
>                 URL: https://issues.apache.org/jira/browse/AVRO-414
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-414.patch
>
>
> The block based format of the binary file allows for block-based operations that do not decode or encode data.
> Two such use cases are:
> * Change the compression codec or compression level of a file without decoding.
> * Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-414) Binary File concatenate and block-based apppend / compress / uncompress

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-414:
-----------------------------

    Attachment: AVRO-414.patch

Patch updated to apply to the latest trunk (after AVRO-392).

> Binary File concatenate and block-based apppend / compress / uncompress
> -----------------------------------------------------------------------
>
>                 Key: AVRO-414
>                 URL: https://issues.apache.org/jira/browse/AVRO-414
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-414.patch, AVRO-414.patch, AVRO-414_plus_392.patch
>
>
> The block based format of the binary file allows for block-based operations that do not decode or encode data.
> Two such use cases are:
> * Change the compression codec or compression level of a file without decoding.
> * Concatenate two files with identical schemas together quickly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.