You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/09/03 17:45:00 UTC

[jira] [Work logged] (BEAM-7505) Create SideInput Python Load Test Jenkins Job

     [ https://issues.apache.org/jira/browse/BEAM-7505?focusedWorklogId=478718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-478718 ]

ASF GitHub Bot logged work on BEAM-7505:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Sep/20 17:44
            Start Date: 03/Sep/20 17:44
    Worklog Time Spent: 10m 
      Work Description: tysonjh commented on pull request #11856:
URL: https://github.com/apache/beam/pull/11856#issuecomment-686648107


   > @lukecwik @tysonjh
   > 
   > Recently, once again I've started working on side input tests with windows. Here's a part of my pipeline which I ended up with:
   > 
   > ```python
   > side_input = (
   >     self.pipeline
   >     | 'Main input: create' >> beam.Create(range(self.windows))
   >     | 'Main input: assign timestamps' >> beam.ParDo(AssignTimestamps())
   >     |
   >     'Main input: apply windows' >> beam.WindowInto(window.FixedWindows(1))
   >     | beam.Map(
   >   lambda _: {
   >     'key_size': 100,
   >     'value_size': 900,
   >     'num_records': 10000,
   >     'initial_splitting_num_bundles': 0,
   >     'initial_splitting_desired_bundle_size': 0,
   >     'sleep_per_input_record_sec': 0,
   >     'initial_splitting': 'const'
   >   })
   >     | 'Read from synthetic source' >> beam.ParDo(SyntheticSDFAsSource()))
   > ```
   > 
   > `self.windows` is `1000` and `AssignTimestamps` is a very simple DoFn that assigns timestamps to elements:
   > 
   > ```python
   > class AssignTimestamps(beam.DoFn):
   >   def __init__(self):
   >     # Avoid having to use save_main_session
   >     self.window = window
   > 
   >   def process(self, element):
   >     yield self.window.TimestampedValue(element, element)
   > ```
   > 
   > This doesn't work on Dataflow (although it works perfectly fine when using Direct runner). Here's an error message:
   > 
   > ```
   > File "/usr/local/lib/python3.7/site-packages/apache_beam/io/restriction_trackers.py", line 91, in __init__ assert isinstance(offset_range, OffsetRange) RuntimeError: AssertionError [while running 'Read from synthetic source']
   > ```
   > 
   > I did some debugging and discovered that `offset_range` is None in this context.
   > 
   > Does SDF properly support windowed side inputs on Dataflow? If not, do you know an other way to write tests for side inputs involving windows? Let me know if you need more information. Thanks.
   
   @boyuanzz could you weigh in here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 478718)
    Time Spent: 10h 20m  (was: 10h 10m)

> Create SideInput Python Load Test Jenkins Job
> ---------------------------------------------
>
>                 Key: BEAM-7505
>                 URL: https://issues.apache.org/jira/browse/BEAM-7505
>             Project: Beam
>          Issue Type: Sub-task
>          Components: testing
>            Reporter: Kasia Kucharczyk
>            Assignee: Kamil Wasilewski
>            Priority: P2
>          Time Spent: 10h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)