You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Eugene Kirpichov (JIRA)" <ji...@apache.org> on 2017/07/24 22:08:00 UTC
[jira] [Created] (BEAM-2677) AvroIO.read without specifying a
schema
Eugene Kirpichov created BEAM-2677:
--------------------------------------
Summary: AvroIO.read without specifying a schema
Key: BEAM-2677
URL: https://issues.apache.org/jira/browse/BEAM-2677
Project: Beam
Issue Type: Bug
Components: sdk-java-core
Reporter: Eugene Kirpichov
Assignee: Eugene Kirpichov
Sometimes it is inconvenient to require the user of AvroIO.read/readAll to specify a Schema for the Avro files they are reading, especially if different files may have different schemas.
It is possible to read GenericRecord objects from an Avro file, however it is not possible to provide a Coder for GenericRecord without knowing the schema: a GenericRecord knows its schema so we can encode it into a byte array, but we can not decode it from a byte array without knowing the schema (and encoding the full schema together with every record would be impractical).
Instead, a reasonable approach is to treat schemaless GenericRecord as unencodable and use the same approach as JdbcIO - a user-specified parse callback.
Suggested API: AvroIO.parseGenericRecords(SerializableFunction<GenericRecord, T> parseFn).from(filepattern).
CC: [~mkhadikov] [~reuvenlax]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)