You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Kenneth Knowles (JIRA)" <ji...@apache.org> on 2018/03/06 04:59:00 UTC
[jira] [Assigned] (BEAM-3771) Unable to write using AvroIO without
schema
[ https://issues.apache.org/jira/browse/BEAM-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles reassigned BEAM-3771:
-------------------------------------
Assignee: Chamikara Jayalath (was: Kenneth Knowles)
> Unable to write using AvroIO without schema
> -------------------------------------------
>
> Key: BEAM-3771
> URL: https://issues.apache.org/jira/browse/BEAM-3771
> Project: Beam
> Issue Type: Bug
> Components: beam-model
> Reporter: Darshan Mehta
> Assignee: Chamikara Jayalath
> Priority: Major
>
> I am working on a specific use case where I don't know the schema while writing the GenericRecords' PCollection to File system. Here's how the use case works:
> * My dataflow listens to Pubsub's subscription and gets the message in this format :
> {code:java}
> // {"schema" : <schema_id>, "payload" : "<payload>"}
> {code}
> * It then extracts the id, looks up schema registry and gets the schema for a specific elelemt
> * The payload is then deserialised into GenericRecord
> * PCollection of these records is forwarded to BigQuery writer and it gets written to BigQuery
> * It then is passed to Storage writer that writes to file system using AvroIO
> Now, I am struggling with the last step as AvroIO expects a schema whereas I do not know schema at compile time. All I have is a bunch of elements with schema id embedded.
> Is there any way for AvroIO to write the records to FileSystem without schema? If not, do I have any other alternatives (formats) to write to file system?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)