You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kamil Wasilewski (Jira)" <ji...@apache.org> on 2020/08/18 09:29:00 UTC
[jira] [Comment Edited] (BEAM-9154) Move Chicago Taxi Example to
Python 3
[ https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179488#comment-17179488 ]
Kamil Wasilewski edited comment on BEAM-9154 at 8/18/20, 9:28 AM:
------------------------------------------------------------------
> is this reproducible on newer versions of TFT + Tensorflow?
To be honest, I have no idea. I tried tensorflow 1.15.3 (the latest 1.x version), but that version depends on tfx-bsl, which depends on an older version of Beam, which leads us to a circular dependency. I suppose tensorflow 2.x won't work either, since the code was written with tensorflow 1.x in mind.
> Is there a way TFT team can reproduce the error?
Yes. Steps to reproduce:
1. Have python 3.7 (3.5 and 3.6 might work as well) and the latest version of Beam from the master branch
2. Have a GCP project configured
3. Execute gradle task: ./gradlew :sdks:python:test-suites:dataflow:py2:chicagoTaxiExample -PgcsRoot=gs://GCS_BUCKET -PpipelineOptions="-num_workers=5 --autoscaling_algorithm=NONE" -PpythonVersion=3.7
By the way, now I had a different error when running the example on Python 3.7:
{code:java}
apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'ParDo(_SeparateMetricsAndPlotsFn)': requires Tuple[Union[Tuple[Tuple[str, Union[bytes, float, int]], ...], Tuple[]], List[Any]] but got Tuple[Union[Tuple[Tuple[str, Union[bytes, float, int]], ...], Tuple[]], Dict[str, Any]] for element
Full type hint:
IOTypeHints[inputs=((Tuple[Union[Tuple[Tuple[str, Union[bytes, float, int]], ...], Tuple[]], List[Any]],), {}), outputs=((Any,), {})]
strip_iterable()based on:
IOTypeHints[inputs=((Tuple[Union[Tuple[Tuple[str, Union[bytes, float, int]], ...], Tuple[]], List[Any]],), {}), outputs=((Any,), {})]
from_callable(process)
signature: (element: Tuple[Union[Tuple[()], Tuple[Tuple[str, Union[bytes, int, float]], ...]], List[Any]])
File "/Users/kamilwasilewski/.pyenv/versions/beam-chicago/lib/python3.7/site-packages/tensorflow_model_analysis/evaluators/aggregate.py", line 493
{code}
was (Author: kamilwu):
> is this reproducible on newer versions of TFT + Tensorflow?
To be honest, I have no idea. I tried tensorflow 1.15.3 (the latest 1.x version), but that version depends on tfx-bsl, which depends on an older version of Beam, which leads us to a circular dependency. I suppose tensorflow 2.x won't work either, since the code was written with tensorflow 1.x in mind.
> Is there a way TFT team can reproduce the error?
Yes. Steps to reproduce:
1. Have python 3.7 (3.5 and 3.6 might work as well) and the latest version of Beam from the master branch
2. Have a GCP project configured
3. Execute gradle task: ./gradlew :sdks:python:test-suites:dataflow:py2:chicagoTaxiExample -PgcsRoot=gs://GCS_BUCKET -PpipelineOptions="--num_workers=5 --autoscaling_algorithm=NONE" -PpythonVersion=3.7
By the way, now I had a different error when running the example on Python 3.7:
{code:java}
apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'ParDo(_SeparateMetricsAndPlotsFn)': requires Tuple[Union[Tuple[Tuple[str, Union[bytes, float, int]], ...], Tuple[]], List[Any]] but got Tuple[Union[Tuple[Tuple[str, Union[bytes, float, int]], ...], Tuple[]], Dict[str, Any]] for element
Full type hint:
IOTypeHints[inputs=((Tuple[Union[Tuple[Tuple[str, Union[bytes, float, int]], ...], Tuple[]], List[Any]],), {}), outputs=((Any,), {})]
strip_iterable()based on:
IOTypeHints[inputs=((Tuple[Union[Tuple[Tuple[str, Union[bytes, float, int]], ...], Tuple[]], List[Any]],), {}), outputs=((Any,), {})]
from_callable(process)
signature: (element: Tuple[Union[Tuple[()], Tuple[Tuple[str, Union[bytes, int, float]], ...]], List[Any]])
File "/Users/kamilwasilewski/.pyenv/versions/beam-chicago/lib/python3.7/site-packages/tensorflow_model_analysis/evaluators/aggregate.py", line 493
{code}
> Move Chicago Taxi Example to Python 3
> -------------------------------------
>
> Key: BEAM-9154
> URL: https://issues.apache.org/jira/browse/BEAM-9154
> Project: Beam
> Issue Type: Improvement
> Components: testing
> Reporter: Kamil Wasilewski
> Assignee: Kamil Wasilewski
> Priority: P1
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
> File "preprocess.py", line 259, in <module>
> main()
> File "preprocess.py", line 254, in main
> project=known_args.metric_reporting_project
> File "preprocess.py", line 155, in transform_data
> ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
> File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", line 987, in __ror__
> return self.transform.__ror__(pvalueish, self.label)
> File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", line 547, in __ror__
> result = p.apply(self, pvalueish, label)
> File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 532, in apply
> return self.apply(transform, pvalueish)
> File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 573, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", line 193, in apply
> return m(transform, input, options)
> File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", line 223, in apply_PTransform
> return transform.expand(input)
> File "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 825, in expand
> input_metadata))
> File "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 716, in expand
> output_signature = self._preprocessing_fn(copied_inputs)
> File "preprocess.py", line 102, in preprocessing_fn
> _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi
--
This message was sent by Atlassian Jira
(v8.3.4#803005)