You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2013/07/22 19:14:48 UTC
[jira] [Created] (AVRO-1357) Allow to force reading generic records
for input data and map output data
Xiangrui Meng created AVRO-1357:
-----------------------------------
Summary: Allow to force reading generic records for input data and map output data
Key: AVRO-1357
URL: https://issues.apache.org/jira/browse/AVRO-1357
Project: Avro
Issue Type: New Feature
Components: java
Affects Versions: 1.7.4
Reporter: Xiangrui Meng
In AvroJob/AvroInputFormat/AvroRecordReader, we can choose either SpecificDatumReader or ReflectDatumReader to read input data and map output data, but not GenericDatumReader. We may want to force reading generic records for some jobs.
For example, assume that the input records contain a field called "category" and we want to compute the number of records for each category. If we can force reading generic records, we can get the category string by calling get("category"). Otherwise, the input record might be loaded as a GenericRecord instance or a SpecificRecord instance. The latter does not implement GenericRecord.
To add this feature, we can change the booleans IS_REFLECT/MAP_OUTPUT_IS_REFLECT into enums called INPUT_AVRO_DESERIALIZATION_TYPE/MAP_OUTPUT_AVRO_DESERIALIZATION_TYPE, and return the corresponding DatumReader based on the type.
We can add setDeserializationType/setInputDeserializationType/setMapOutputDeserializationType to AvroJob while deprecating setReflect/setInputReflect/setMapOutputReflect.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira