You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/10/24 17:54:24 UTC

[GitHub] [beam] nikie edited a comment on pull request #15775: [BEAM-12730] Python. Custom delimiter add corner case

nikie edited a comment on pull request #15775:
URL: https://github.com/apache/beam/pull/15775#issuecomment-950366215


   There is another issue, related to bundles splitting, which needs to be addressed.
   This code in `split_records` method https://github.com/apache/beam/blob/6173abb2e241865d89dff9ae679f4d422be84ee9/sdks/python/apache_beam/io/textio.py#L169-L183 given a desired `start_offset`, provided by `range_tracker`, determines the start of next line from the `start_offset`, looking back 1 character to determine if the line starts exactly at `start_offset` position. This assumes that line separator always ends with `\n` (the default delimiter).
   
   Here is similar `TextSource` code in Java SDK for comparison - it takes into account custom delimiter:
   https://github.com/apache/beam/blob/6173abb2e241865d89dff9ae679f4d422be84ee9/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L149-L160


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org