You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2010/09/08 19:54:33 UTC
[jira] Created: (AVRO-662) Java: Add InputFormat for SequenceFiles
using Reflect API
Java: Add InputFormat for SequenceFiles using Reflect API
---------------------------------------------------------
Key: AVRO-662
URL: https://issues.apache.org/jira/browse/AVRO-662
Project: Avro
Issue Type: New Feature
Components: java
Reporter: Doug Cutting
Assignee: Doug Cutting
It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program. Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation. AvroOutputFormat could also be changed to accept such reflected data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-662) Java: Add InputFormat for SequenceFiles
using Reflect API
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting updated AVRO-662:
------------------------------
Attachment: AVRO-662.patch
Here's a patch that adds this feature. A SequenceFileInputFormat is added that presents sequence file data in a form compatible with Avro's MapReduce API. In particular, primitive Writable types (LongWritable, Text, etc.) are converted to corresponding Avro types (Long, CharSequence, etc.), while reflection is used to infer a schema for complex Writables. The Writable implementation must be available at runtime, of course.
I also abstracted a FileReader interface and added a SequenceFileReader implementation. This permits easier integration of SequenceFile and other formats into Avro tools. For example, it would now be a simple matter to extend Avro's 'tojson' command to also dump SequenceFile data as JSON.
> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
> Key: AVRO-662
> URL: https://issues.apache.org/jira/browse/AVRO-662
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.4.1
>
> Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program. Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation. AvroOutputFormat could also be changed to accept such reflected data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-662) Java: Add InputFormat for SequenceFiles
using Reflect API
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting updated AVRO-662:
------------------------------
Status: Patch Available (was: Open)
Fix Version/s: 1.4.1
> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
> Key: AVRO-662
> URL: https://issues.apache.org/jira/browse/AVRO-662
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.4.1
>
> Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program. Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation. AvroOutputFormat could also be changed to accept such reflected data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-662) Java: Add InputFormat for
SequenceFiles using Reflect API
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910391#action_12910391 ]
Doug Cutting commented on AVRO-662:
-----------------------------------
If no one objects, I will commit this soon.
> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
> Key: AVRO-662
> URL: https://issues.apache.org/jira/browse/AVRO-662
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.4.1
>
> Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program. Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation. AvroOutputFormat could also be changed to accept such reflected data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-662) Java: Add InputFormat for
SequenceFiles using Reflect API
Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910424#action_12910424 ]
Scott Carey commented on AVRO-662:
----------------------------------
A quick review looks good. I did not go into complete depth on SequenceFileReader though.
+1
The current Unit test covers 85% of the LOC in SequenceFileReader, but doesn't touch several data types and always hits the WRITABLE_SCHEMAS cache.
> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
> Key: AVRO-662
> URL: https://issues.apache.org/jira/browse/AVRO-662
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.4.1
>
> Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program. Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation. AvroOutputFormat could also be changed to accept such reflected data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-662) Java: Add InputFormat for SequenceFiles
using Reflect API
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting updated AVRO-662:
------------------------------
Status: Resolved (was: Patch Available)
Hadoop Flags: [Reviewed]
Resolution: Fixed
I committed this.
> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
> Key: AVRO-662
> URL: https://issues.apache.org/jira/browse/AVRO-662
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.4.1
>
> Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java mapreduce program. Once the reflect, specific and generic representations are fully compatible (AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation. AvroOutputFormat could also be changed to accept such reflected data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.