You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Blaine Hansen (Jira)" <ji...@apache.org> on 2021/11/02 19:42:00 UTC
[jira] [Updated] (BEAM-13166) Versions after `2.28.0` fail to infer
grouping decoders after a date is selected from a data structure
[ https://issues.apache.org/jira/browse/BEAM-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Blaine Hansen updated BEAM-13166:
---------------------------------
Description:
The below code throws this type error on the effected versions, and merely works as expected on 2.28.0:
`TypeError: Unable to deterministically encode '2021-11-02' of type '<class 'datetime.date'>', please provide a type hint for the input of 'GroupByKey' [while running 'Create/Map(decode)']`
{code:python}
import typing
from datetime import date
import apache_beam as beam
from apache_beam.testing.test_pipeline import TestPipeline
with TestPipeline() as pipeline:
today = date.today()
results = (
pipeline
| beam.Create([(1, { 'd': today }), (1, { 'd': today })])
| beam.MapTuple(lambda i, d: (d['d'], i)) # <-- this step only requires output type hints on versions after 2.28.0, and only if the date is being "projected" from some other data structure
| beam.CombinePerKey(sum) # <-- if this aggregation is removed, the pipeline also works without error
)
results | beam.Map(print)
{code}
This stackoverflow issue is having the same problem:
https://stackoverflow.com/questions/69409693/how-do-i-use-a-datetime-date-value-in-apache-beam-groupby
It's possible to fix the errors by registering a `DateCoder` and adding output type hints to the projection `MapTuple` step, but since this isn't necessary in other situations and versions, it seems this is a bug. Our production pipelines will need to add many of these tedious type hints in order to work properly, so we're effectively blocked from upgrading to the newest version.
was:
The below code throws this type error on the effected versions, and merely works as expected on 2.28.0:
`TypeError: Unable to deterministically encode '2021-11-02' of type '<class 'datetime.date'>', please provide a type hint for the input of 'GroupByKey' [while running 'Create/Map(decode)']`
{code:python}
import typing
from datetime import date
import apache_beam as beam
from apache_beam.testing.test_pipeline import TestPipeline
with TestPipeline() as pipeline:
today = date.today()
results = (
pipeline
| beam.Create([(1, { 'd': today }), (1, { 'd': today })])
| beam.MapTuple(lambda i, d: (d['d'], i)) # <-- this step only requires output type hints on versions after 2.28.0, and only if the date is being "projected" from some other data structure
| beam.CombinePerKey(sum) # <-- if this aggregation is removed, the pipeline also works without error
)
results | beam.Map(print)
{code}
This stackoverflow issue is having the same problem:
https://stackoverflow.com/questions/69409693/how-do-i-use-a-datetime-date-value-in-apache-beam-groupby
It's possible to fix the errors by registering a `DateCoder` and adding output type hints to the projection `MapTuple` step, but since this works fine in other situations and versions, it seems this is a bug. Our production pipelines will need to add many of these tedious type hints in order to work properly, so we're effectively blocked from upgrading to the newest version.
> Versions after `2.28.0` fail to infer grouping decoders after a date is selected from a data structure
> ------------------------------------------------------------------------------------------------------
>
> Key: BEAM-13166
> URL: https://issues.apache.org/jira/browse/BEAM-13166
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.29.0, 2.30.0, 2.31.0, 2.32.0, 2.33.0
> Environment: We're using python linux docker images, such as `python:bullseye`, and building an image that installs packages from a `requirements.txt` file with a beam requirement such as `apache-beam ~= 2.28.0`
> Reporter: Blaine Hansen
> Priority: P2
> Fix For: 2.28.0
>
>
> The below code throws this type error on the effected versions, and merely works as expected on 2.28.0:
> `TypeError: Unable to deterministically encode '2021-11-02' of type '<class 'datetime.date'>', please provide a type hint for the input of 'GroupByKey' [while running 'Create/Map(decode)']`
> {code:python}
> import typing
> from datetime import date
> import apache_beam as beam
> from apache_beam.testing.test_pipeline import TestPipeline
> with TestPipeline() as pipeline:
> today = date.today()
> results = (
> pipeline
> | beam.Create([(1, { 'd': today }), (1, { 'd': today })])
> | beam.MapTuple(lambda i, d: (d['d'], i)) # <-- this step only requires output type hints on versions after 2.28.0, and only if the date is being "projected" from some other data structure
> | beam.CombinePerKey(sum) # <-- if this aggregation is removed, the pipeline also works without error
> )
> results | beam.Map(print)
> {code}
> This stackoverflow issue is having the same problem:
> https://stackoverflow.com/questions/69409693/how-do-i-use-a-datetime-date-value-in-apache-beam-groupby
> It's possible to fix the errors by registering a `DateCoder` and adding output type hints to the projection `MapTuple` step, but since this isn't necessary in other situations and versions, it seems this is a bug. Our production pipelines will need to add many of these tedious type hints in order to work properly, so we're effectively blocked from upgrading to the newest version.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)