You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2014/07/06 21:11:34 UTC

[jira] [Updated] (CRUNCH-433) Add support for reading specific/reflect data from an Avro MR file

     [ https://issues.apache.org/jira/browse/CRUNCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabriel Reid updated CRUNCH-433:
--------------------------------

    Attachment: CRUNCH-433.patch

Patch that introduces a new Avro PTableType for reading/writing files of AvroKeyValues, compatible with files created and expected by org.apache.avro.mapreduce.AvroJob.

Also adds methods in the From class for reading Avro key/value files directly as a PTable.

> Add support for reading specific/reflect data from an Avro MR file
> ------------------------------------------------------------------
>
>                 Key: CRUNCH-433
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-433
>             Project: Crunch
>          Issue Type: New Feature
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>         Attachments: CRUNCH-433.patch
>
>
> An Avro Key/Value file written via raw MapReduce contains records that follow the schema generated by the org.apache.avro.hadoop.io.AvroKeyValue class. 
> If these files contain specific or reflection-based records, there is currently no easy way to read them in as specific or reflection records. Using the basic public Crunch APIs, they can only be read as generic records (that also contain generic records).
> A method should be added to the Avros class which allows specifying specific PTypes to be used for reading the underlying data types within a raw MR output file.
> Link to related discussion that inspired this ticket on the user list: http://s.apache.org/es



--
This message was sent by Atlassian JIRA
(v6.2#6252)