You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (JIRA)" <ji...@apache.org> on 2019/03/05 05:22:00 UTC

[jira] [Commented] (BEAM-6748) Splitting logic in Avro IO tests behaves unexpectedly in Python 3

    [ https://issues.apache.org/jira/browse/BEAM-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784093#comment-16784093 ] 

Valentyn Tymofieiev commented on BEAM-6748:
-------------------------------------------

I suspect the problem here  may be with the test.

Adding some debug output  at  [https://github.com/apache/beam/blob/af2e5bd8a42ea0eb7ce12ea29b8a32757accc197/sdks/python/apache_beam/io/avroio.py#L465 |https://github.com/apache/beam/blob/af2e5bd8a42ea0eb7ce12ea29b8a32757accc197/sdks/python/apache_beam/io/avroio.py#L465.], we can see that the on Python 3 block.size is roughly 16000 (bytes?), while on Python 2 it is roughly 64000.  Exact numbers slightly vary, but it think there is some implementation detail in fastavro library that on Python 2 makes a default block size to be <=64k, and on Python 3 <=16k.

So on Python 3 we have 3 blocks total, while on Python 3 we have 11 blocks.

The test assumes that there are always 3 blocks: blocks: [https://github.com/apache/beam/blob/af2e5bd8a42ea0eb7ce12ea29b8a32757accc197/sdks/python/apache_beam/io/avroio_test.py#L302.] 

[~chamikara], do you know how does fastavro.read.block_reader selects block size? Are there any requirements or guarantees that dictate a certain size, or it may be implementation/platform-dependent?

> Splitting logic in Avro IO tests behaves unexpectedly in Python 3
> -----------------------------------------------------------------
>
>                 Key: BEAM-6748
>                 URL: https://issues.apache.org/jira/browse/BEAM-6748
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>
> *apache_beam.io.avroio_test.TestAvro.test_split_points*
> *apache_beam.io.avroio_test.TestFastAvro.test_split_points*
> fail with:
>  
> {code:java}
> Traceback (most recent call last):
>  File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", line 308, in test_split_points
>  self.assertEquals(split_points_report[-10:], [(2, 1)] * 10)
> AssertionError: Lists differ: [(10, 1), (10, 1), (10, 1), (10, 1), (10, 1[42 chars], 1)] != [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2[32 chars], 1)]
> First differing element 0:
> (10, 1)
> (2, 1)
> + [(2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1), (2, 1)]
> - [(10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1),
> - (10, 1)] {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)