You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2022/05/25 00:25:00 UTC

[jira] [Commented] (BEAM-14508) Parquetio should produce a schema'd PCollection

    [ https://issues.apache.org/jira/browse/BEAM-14508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541760#comment-17541760 ] 

Brian Hulette commented on BEAM-14508:
--------------------------------------

Agreed! I think the ideal solution here would be to allow inferring schemas from a TypedDict typehint, then the execution time code doesn't have to change, we'd just need to add a typehint. Unfortunately we don't yet support schema inference from a TypedDict.

Until then, we could add an option that changes the output type to NamedTuple, similar to what Svetak is doing for BigQuery.

Finally - for any user that lands here I want to point out that there is a workaround. You could use the DataFrame API [read_parquet|https://beam.apache.org/releases/pydoc/2.38.0/apache_beam.dataframe.io.html#apache_beam.dataframe.io.read_parquet] method, then call to_pcollection on the result.

> Parquetio should produce a schema'd PCollection
> -----------------------------------------------
>
>                 Key: BEAM-14508
>                 URL: https://issues.apache.org/jira/browse/BEAM-14508
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Robert Bradshaw
>            Assignee: Brian Hulette
>            Priority: P2
>
> Or at least have an option to do so.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)