You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Juta Staes (JIRA)" <ji...@apache.org> on 2019/03/14 09:22:00 UTC

[jira] [Comment Edited] (BEAM-5627) Investigate why test_split_at_fraction_exhaustive consistently fails to split after 101 attempts on Python 3

    [ https://issues.apache.org/jira/browse/BEAM-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792477#comment-16792477 ] 

Juta Staes edited comment on BEAM-5627 at 3/14/19 9:21 AM:
-----------------------------------------------------------

I investigated why this test leaves warnings:

This test executes two threads  where one will try to read data and the other thread will try to spilt the data which is only possible if the first thread did not finish executing. The test will not leave any warnings if both execution orders are observed when running the threads 100 times (at least once succeeding the split and at least once failing). In python 3 threads are implemented differently which results in the threads always executing in the same order even when executing 1000 times because the threads duration is too short to be interrupted. If I increase the duration of both threads by adding `time.sleep(0.003)` the test does not leave any warnings.

Test code regarding threads: [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/source_test_utils.py#L651]

Conclusion: this test needs to be redesigned for Python 3. Do you have any suggestions on how to handle this?

cc: [~chamikara], [~tvalentyn]


was (Author: juta):
I investigated why this test leaves warnings:

This test executes two threads  where one will try to read data and the other thread will try to spilt the data which is only possible if the first thread did not finish executing. The test will not leave any warnings if both execution orders are observed when running the threads 100 times (at least once succeeding the split and at least once failing). In python 3 threads are implemented differently which results in the threads always executing in the same order even when executing 1000 times because the threads duration is too short to be interrupted. If I increase the duration of both threads by adding `time.sleep(0.003)` the test does not leave any warnings.

Test code regarding threads: [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/source_test_utils.py#L651]

Conclusion: this test needs to be redesigned for Python 3. Do you have any suggestions on how to handle this?

cc: @chamikaramj, @tvalentyn

> Investigate why test_split_at_fraction_exhaustive consistently fails to split after 101 attempts on Python 3
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-5627
>                 URL: https://issues.apache.org/jira/browse/BEAM-5627
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Juta Staes
>            Priority: Minor
>              Labels: triaged
>             Fix For: Not applicable
>
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> ERROR: test_split_at_fraction_exhaustive (apache_beam.io.source_test_utils_test.SourceTestUtilsTest)
>  ----------------------------------------------------------------------
>  Traceback (most recent call last):
>    File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py", line 120, in test_split_at_fraction_exhaustive
>      source = self._create_source(data)
>    File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py", line 43, in _create_source
>      source = LineSource(self._create_file_with_data(data))
>    File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py", line 35, in _create_file_with_data
>      f.write(line + '\n')
>    File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/tempfile.py", line 622, in func_wrapper
>      return func(*args, **kwargs)
> TypeError: a bytes-like object is required, not 'str'
> Also similar:
> ======================================================================
>  ERROR: test_file_sink_writing (apache_beam.io.filebasedsink_test.TestFileBasedSink)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/          apache_beam/io/filebasedsink_test.py", line 121, in test_file_sink_writing
>       init_token, writer_results = self._common_init(sink)
>     File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/          apache_beam/io/filebasedsink_test.py", line 103, in _common_init
>       writer1 = sink.open_writer(init_token, '1')
>     File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/          apache_beam/options/value_provider.py", line 133, in _f
>       return fnc(self, *args, **kwargs)
>     File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/          apache_beam/io/filebasedsink.py", line 185, in open_writer
>     return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
>     File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/          apache_beam/io/filebasedsink.py", line 385, in __init__
>       self.temp_handle = self.sink.open(temp_shard_path)
>     File "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/          apache_beam/io/filebasedsink_test.py", line 82, in open
>       file_handle.write('[start]')
>   TypeError: a bytes-like object is required, not 'str'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)