You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2022/05/25 00:25:00 UTC
[jira] [Commented] (BEAM-14508) Parquetio should produce a schema'd PCollection
[ https://issues.apache.org/jira/browse/BEAM-14508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541760#comment-17541760 ]
Brian Hulette commented on BEAM-14508:
--------------------------------------
Agreed! I think the ideal solution here would be to allow inferring schemas from a TypedDict typehint, then the execution time code doesn't have to change, we'd just need to add a typehint. Unfortunately we don't yet support schema inference from a TypedDict.
Until then, we could add an option that changes the output type to NamedTuple, similar to what Svetak is doing for BigQuery.
Finally - for any user that lands here I want to point out that there is a workaround. You could use the DataFrame API [read_parquet|https://beam.apache.org/releases/pydoc/2.38.0/apache_beam.dataframe.io.html#apache_beam.dataframe.io.read_parquet] method, then call to_pcollection on the result.
> Parquetio should produce a schema'd PCollection
> -----------------------------------------------
>
> Key: BEAM-14508
> URL: https://issues.apache.org/jira/browse/BEAM-14508
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Robert Bradshaw
> Assignee: Brian Hulette
> Priority: P2
>
> Or at least have an option to do so.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)