You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by "Etienne Chauchot (JIRA)" <ji...@apache.org> on 2017/11/06 16:29:00 UTC

[jira] [Commented] (BEAM-2993) AvroIO.write without specifying a schema

    [ https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240520#comment-16240520 ] 

Etienne Chauchot commented on BEAM-2993:
----------------------------------------

As the PCollection is not ordered, if one bundle ends up having only SCHEMA1 records and the other only SCHEMA2 records, then guessing the schema lazily at "first" element will write the 2 bundles with no error because it will guess SCHEMA1 from bundle 1 and SCHEMA2 from bundle 2. It will then result in producing an avro file that has 2 schemas which is wrong

> AvroIO.write without specifying a schema
> ----------------------------------------
>
>                 Key: BEAM-2993
>                 URL: https://issues.apache.org/jira/browse/BEAM-2993
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be able to write to avro files using {{AvroIO}} without specifying a schema at build time. Consider the following use case: a user has a {{PCollection<GenericRecord>}}  but the schema is only known while running the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the schema is already available in {{GenericRecord}}. We should be able to call {{AvroIO.writeGenericRecords()}} with no schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)