You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2020/08/20 17:25:00 UTC
[jira] [Commented] (BEAM-10769) Fix Avro IO documentation: when
fastavro is used, do not pass schema parsed by avro-python3.
[ https://issues.apache.org/jira/browse/BEAM-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181348#comment-17181348 ]
Valentyn Tymofieiev commented on BEAM-10769:
--------------------------------------------
Beam switched to use FastAvro as a default library on Python 3. The fastavro-based Avro sink expects schema as a dictionary, while the avro-python3-based Avro Sink expects a schema that was previously parsed by avro.schema.Parse(). Fastavro will not accept a schema parsed by avro-python3.
When a user switches their pipeline with WriteToAvro transform to Python 3, but does not change how schema is passed to the transform and thus passes a schema parsed by avro.schema.Parse(), fastavro will not be able parse the schema, since FastAvro expects schema as a dictionary. Also FastAvro does not require a parsed schema, although supplying a schema parsed by fastavro works too.
The error may manifest as follows:
{noformat}
...lib/python3.7/site-packages/apache_beam/io/avroio.py", line 634, in open
return Writer(file_handle, self._schema, self._codec)
File "fastavro/_write.pyx", line 522, in fastavro._write.Writer.__init__
File "fastavro/_schema.pyx", line 71, in fastavro._schema.parse_schema
File "fastavro/_schema.pyx", line 85, in fastavro._schema._parse_schema
TypeError: unhashable type: 'RecordSchema' [while running 'SampleInfoToAvro/WriteToAvroFiles/Write/WriteImpl/WriteBundles']
{noformat}
To fix the error, users should pass the schema to the sink as a dictionary. https://github.com/apache/beam/pull/12638 is out to fix the documentation and catch these errors with a better error message.
> Fix Avro IO documentation: when fastavro is used, do not pass schema parsed by avro-python3.
> --------------------------------------------------------------------------------------------
>
> Key: BEAM-10769
> URL: https://issues.apache.org/jira/browse/BEAM-10769
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Valentyn Tymofieiev
> Assignee: Valentyn Tymofieiev
> Priority: P2
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)