You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2022/01/12 18:28:01 UTC

[jira] [Commented] (BEAM-13166) Versions after `2.28.0` fail to infer grouping decoders after a date is selected from a data structure

    [ https://issues.apache.org/jira/browse/BEAM-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474800#comment-17474800 ] 

Beam JIRA Bot commented on BEAM-13166:
--------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean.


> Versions after `2.28.0` fail to infer grouping decoders after a date is selected from a data structure
> ------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-13166
>                 URL: https://issues.apache.org/jira/browse/BEAM-13166
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.29.0, 2.30.0, 2.31.0, 2.32.0, 2.33.0
>         Environment: We're using python linux docker images, such as `python:bullseye`, and building an image that installs packages from a `requirements.txt` file with a beam requirement such as `apache-beam ~= 2.28.0`
>            Reporter: Blaine Hansen
>            Priority: P2
>              Labels: stale-P2
>             Fix For: 2.28.0
>
>
> The below code throws this type error on the effected versions, and merely works as expected on 2.28.0:
> `TypeError: Unable to deterministically encode '2021-11-02' of type '<class 'datetime.date'>', please provide a type hint for the input of 'GroupByKey' [while running 'Create/Map(decode)']`
> {code:python}
> import typing
> from datetime import date
> import apache_beam as beam
> from apache_beam.testing.test_pipeline import TestPipeline
> with TestPipeline() as pipeline:
> 	today = date.today()
> 	results = (
> 		pipeline
> 		| beam.Create([(1, { 'd': today }), (1, { 'd': today })])
> 		| beam.MapTuple(lambda i, d: (d['d'], i)) # <-- this step only requires output type hints on versions after 2.28.0, and only if the date is being "projected" from some other data structure
> 		| beam.CombinePerKey(sum) # <-- if this aggregation is removed, the pipeline also works without error
> 	)
> 	results | beam.Map(print)
> {code}
> This stackoverflow issue is having the same problem:
> https://stackoverflow.com/questions/69409693/how-do-i-use-a-datetime-date-value-in-apache-beam-groupby
> It's possible to fix the errors by registering a `DateCoder` and adding output type hints to the projection `MapTuple` step, but since this isn't necessary in other situations and versions, it seems this is a bug. Our production pipelines will need to add many of these tedious type hints in order to work properly, so we're effectively blocked from upgrading to the newest version.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)