You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Willi Schinmeyer (Jira)" <ji...@apache.org> on 2021/11/10 15:33:00 UTC

[jira] [Created] (BEAM-13217) TypeCheckError due to CoGroupByKey output mis-deduction

Willi Schinmeyer created BEAM-13217:
---------------------------------------

             Summary: TypeCheckError due to CoGroupByKey output mis-deduction
                 Key: BEAM-13217
                 URL: https://issues.apache.org/jira/browse/BEAM-13217
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
    Affects Versions: 2.33.0
            Reporter: Willi Schinmeyer


After upgrading our Python project from 2.31.0 to 2.33.0, we started getting TypeCheckErrors such as
{quote}apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'all_data/combine_new_and_all': requires {{Tuple[Tuple[Any, Any], Dict[str, Iterable[_CombinedEntry]]]}} but got {{Tuple[Tuple[int, int], Dict[str, List[Union[]]]]}} for element
{quote}
where the output value of a {{CoGroupByKey()}} is apparently incorrectly deduced to be a {{Dict[str, List[Union[]]]}}.

I managed to build a small repro case:
{code:python}
import apache_beam as beam
from typing import Dict, Iterable, Tuple

{
    "foo": [(42, "foo")],
    "bar": [(42, "bar")],
} | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, Iterable[str]]])
{code}
which raises
{quote}apache_beam.typehints.decorators.TypeCheckError: Output type hint violation at CoGroupByKey: expected {{Tuple[int, Dict[str, Iterable[str]]]}}, got {{Tuple[int, Dict[str, List[Union[]]]]}}
{quote}
or alternatively, using a TestPipeline:
{code:python}
import apache_beam as beam
from apache_beam.testing.test_pipeline import TestPipeline
from apache_beam.testing.util import assert_that, equal_to
from typing import Dict, Iterable, Tuple

with TestPipeline() as p:
    actual = {
        "foo": p | "create_foo" >> beam.Create([(42, "foo")]),
        "bar": p | "create_bar" >> beam.Create([(42, "bar")]),
    } | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, Iterable[str]]])
    assert_that(actual, equal_to([(42, {"foo": ["foo"], "bar": ["bar"]})]))
{code}
Oh, and one more thing, about that {{Tuple[Any, Any]}} from the original error message I posted. We can reproduce that like this:
{code:python}
import apache_beam as beam
from typing import Dict, Iterable, NewType, Tuple

key = NewType("key", int)
{
    "foo": [(key(1337), "foo")],
    "bar": [(key(1337), "bar")],
} | beam.CoGroupByKey().with_output_types(Tuple[key, Dict[str, Iterable[str]]])
{code}
{quote}apache_beam.typehints.decorators.TypeCheckError: Output type hint violation at CoGroupByKey: expected {{Tuple[Any, Dict[str, Iterable[str]]]}}, got {{Tuple[int, Dict[str, List[Union[]]]]}}
{quote}
It looks like {{NewType}} is treated as {{Any}}? That surprised me.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)