You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/10/29 17:31:39 UTC

[GitHub] [beam] tvalentyn edited a comment on pull request #15775: [BEAM-12730] Python. Custom delimiter add corner case

tvalentyn edited a comment on pull request #15775:
URL: https://github.com/apache/beam/pull/15775#issuecomment-954919090


   > No, the '\r' would not get appended to the input.
   > I checked the case next code, read text without any transformation, only try to find extra symbol '\r'
   > 
   > READ_BUFFER_SIZE = 8191
   > # DEFAULT_READ_BUFFER_SIZE = 8192
   > delimiter = b'\r\n'
   > 
   > with NamedTemporaryFile("wb") as temp_file:
   >   temp_file.write(b'a' * READ_BUFFER_SIZE + delimiter + b'bbbbb')
   >   temp_file.flush()
   >   with beam.Pipeline() as pipeline:
   >     (
   >             pipeline
   >             | beam.io.ReadFromText(
   > 
   >               file_pattern=temp_file.name,
   >               delimiter=delimiter)
   >             | beam.Map(lambda x: print(x.find(delimiter.decode('utf-8')) >= 0))
   >     )
   > output:
   > False
   > False
   
   Thanks for the snippet. The case i was concerned about is where we use a default separator (evaluates to \n), but split between '\r\n', not a custom separator that is also \r\n. I checked that it also works, since we just extend the buffer, and then still look back 1 character when we encounter \n.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org