You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kamil Wasilewski (Jira)" <ji...@apache.org> on 2020/01/20 10:58:00 UTC

[jira] [Updated] (BEAM-9154) Move Chicago Taxi Example to Python 3

     [ https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kamil Wasilewski updated BEAM-9154:
-----------------------------------
    Description: 
The Chicago Taxi Example[1] should be moved to the latest version of Python supported by Beam (currently it's Python 3.7).

At the moment, the following error occurs when running the benchmark on Python 3.7 (requires futher investigation):
{code:java}
Traceback (most recent call last):
  File "preprocess.py", line 259, in <module>
    main()
  File "preprocess.py", line 254, in main
    project=known_args.metric_reporting_project
  File "preprocess.py", line 155, in transform_data
    ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", line 987, in __ror__
    return self.transform.__ror__(pvalueish, self.label)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", line 547, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 532, in apply
    return self.apply(transform, pvalueish)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 573, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", line 193, in apply
    return m(transform, input, options)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", line 223, in apply_PTransform
    return transform.expand(input)
  File "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 825, in expand
    input_metadata))
  File "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 716, in expand
    output_signature = self._preprocessing_fn(copied_inputs)
  File "preprocess.py", line 102, in preprocessing_fn
    _fill_in_missing(inputs[key]),
KeyError: 'company'
{code}
[1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi

  was:
The Chicago Taxi Example[1] should be moved to the latest version of Python supported by Beam (currently it's Python 3.7). The benchmark should run both on Dataflow and Flink.

At the moment, the following error occurs when running the benchmark (requires futher investigation):
{code:java}
Traceback (most recent call last):
  File "preprocess.py", line 259, in <module>
    main()
  File "preprocess.py", line 254, in main
    project=known_args.metric_reporting_project
  File "preprocess.py", line 155, in transform_data
    ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", line 987, in __ror__
    return self.transform.__ror__(pvalueish, self.label)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", line 547, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 532, in apply
    return self.apply(transform, pvalueish)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 573, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", line 193, in apply
    return m(transform, input, options)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", line 223, in apply_PTransform
    return transform.expand(input)
  File "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 825, in expand
    input_metadata))
  File "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 716, in expand
    output_signature = self._preprocessing_fn(copied_inputs)
  File "preprocess.py", line 102, in preprocessing_fn
    _fill_in_missing(inputs[key]),
KeyError: 'company'
{code}
[1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi


> Move Chicago Taxi Example to Python 3
> -------------------------------------
>
>                 Key: BEAM-9154
>                 URL: https://issues.apache.org/jira/browse/BEAM-9154
>             Project: Beam
>          Issue Type: Improvement
>          Components: testing
>            Reporter: Kamil Wasilewski
>            Assignee: Kamil Wasilewski
>            Priority: Major
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
>   File "preprocess.py", line 259, in <module>
>     main()
>   File "preprocess.py", line 254, in main
>     project=known_args.metric_reporting_project
>   File "preprocess.py", line 155, in transform_data
>     ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
>   File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", line 987, in __ror__
>     return self.transform.__ror__(pvalueish, self.label)
>   File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py", line 547, in __ror__
>     result = p.apply(self, pvalueish, label)
>   File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 532, in apply
>     return self.apply(transform, pvalueish)
>   File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 573, in apply
>     pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>   File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", line 193, in apply
>     return m(transform, input, options)
>   File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", line 223, in apply_PTransform
>     return transform.expand(input)
>   File "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 825, in expand
>     input_metadata))
>   File "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 716, in expand
>     output_signature = self._preprocessing_fn(copied_inputs)
>   File "preprocess.py", line 102, in preprocessing_fn
>     _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)