You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2010/09/08 19:54:33 UTC

[jira] Created: (AVRO-662) Java: Add InputFormat for SequenceFiles using Reflect API

Java: Add InputFormat for SequenceFiles using Reflect API
---------------------------------------------------------

                 Key: AVRO-662
                 URL: https://issues.apache.org/jira/browse/AVRO-662
             Project: Avro
          Issue Type: New Feature
          Components: java
            Reporter: Doug Cutting
            Assignee: Doug Cutting


It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program.  Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation.  AvroOutputFormat could also be changed to accept such reflected data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-662) Java: Add InputFormat for SequenceFiles using Reflect API

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-662:
------------------------------

    Attachment: AVRO-662.patch

Here's a patch that adds this feature.  A SequenceFileInputFormat is added that presents sequence file data in a form compatible with Avro's MapReduce API.  In particular, primitive Writable types (LongWritable, Text, etc.) are converted to corresponding Avro types (Long, CharSequence, etc.), while reflection is used to infer a schema for complex Writables.  The Writable implementation must be available at runtime, of course.

I also abstracted a FileReader interface and added a SequenceFileReader implementation.  This permits easier integration of SequenceFile and other formats into Avro tools.  For example, it would now be a simple matter to extend Avro's 'tojson' command to also dump SequenceFile data as JSON.

> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
>                 Key: AVRO-662
>                 URL: https://issues.apache.org/jira/browse/AVRO-662
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.1
>
>         Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program.  Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation.  AvroOutputFormat could also be changed to accept such reflected data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-662) Java: Add InputFormat for SequenceFiles using Reflect API

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-662:
------------------------------

           Status: Patch Available  (was: Open)
    Fix Version/s: 1.4.1

> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
>                 Key: AVRO-662
>                 URL: https://issues.apache.org/jira/browse/AVRO-662
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.1
>
>         Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program.  Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation.  AvroOutputFormat could also be changed to accept such reflected data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-662) Java: Add InputFormat for SequenceFiles using Reflect API

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910391#action_12910391 ] 

Doug Cutting commented on AVRO-662:
-----------------------------------

If no one objects, I will commit this soon.

> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
>                 Key: AVRO-662
>                 URL: https://issues.apache.org/jira/browse/AVRO-662
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.1
>
>         Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program.  Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation.  AvroOutputFormat could also be changed to accept such reflected data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-662) Java: Add InputFormat for SequenceFiles using Reflect API

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910424#action_12910424 ] 

Scott Carey commented on AVRO-662:
----------------------------------

A quick review looks good.  I did not go into complete depth on SequenceFileReader though.

+1

The current Unit test covers 85% of the LOC in SequenceFileReader, but doesn't touch several data types and always hits the WRITABLE_SCHEMAS cache. 

> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
>                 Key: AVRO-662
>                 URL: https://issues.apache.org/jira/browse/AVRO-662
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.1
>
>         Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program.  Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation.  AvroOutputFormat could also be changed to accept such reflected data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-662) Java: Add InputFormat for SequenceFiles using Reflect API

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-662:
------------------------------

          Status: Resolved  (was: Patch Available)
    Hadoop Flags: [Reviewed]
      Resolution: Fixed

I committed this.

> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
>                 Key: AVRO-662
>                 URL: https://issues.apache.org/jira/browse/AVRO-662
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.1
>
>         Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program.  Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation.  AvroOutputFormat could also be changed to accept such reflected data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.