You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 21:39:06 UTC

[GitHub] [beam] damccorm opened a new issue, #21122: WriteToFiles destination no changing condition

damccorm opened a new issue, #21122:
URL: https://github.com/apache/beam/issues/21122

   `WriteToFiles` seems to not be working as it should. I have been running some tests and my conclusion is that once the condition for the destination is met once, it is not checked again until a new condition (not seen prev) is met. This results in wrong distribution of elements across files.
   
   Example code 1:
   
   ```
   
   (p | "Create" >> beam.Create(range(100))
      | beam.Map(lambda x: str(x))
      | fileio.WriteToFiles(
   
                path="./dynamic/",
                 destination=lambda n: "in" if n in ["17"] else "out",
   
                sink=fileio.TextSink(),
                 file_naming=fileio.destination_prefix_naming("test"))
   )
   
   ```
   
   
   Here, the expected result should be a file called "in-xyz" containing only "17" and another called "out-xyz" containing the rest. What we see is that "out" contains numbers 0 to 16 and once 17 condition is met, the rest of numbers would go to "in". So we would have "out" from 0 to 16, "in" from 17 on, which is wrong.
   
   Changing the number shows it too.
   
   ____________________
   Example code 2:
   
   ```
   
   def odd_even(x):
       value = "even" if int(x) % 2 == 0 else "odd"
       print(value, x)
       return
   value
   
   (p | "Create" >> beam.Create(range(100))
      | beam.Map(lambda x: str(x))
      | fileio.WriteToFiles(
   
                path="./dynamic/",
                 destination=odd_even,
                 sink=fileio.TextSink(),
   
                file_naming=fileio.destination_prefix_naming("test"))
   )
   
   ```
   
   
   We can see that the `odd_even` fn is return the right value,  but destination is still wrong. We get "even" only with 0 and "odd" with the rest of numbers, since the condition changed with element "1"
   
   ____________________
   Example code 3:
   
   Trying more conditionals or different `file_naming` doesn't fix this
   
   ```
   
   def test_15(n):
       three = "three" if int(n) % 3 == 0 else ""
       five = "five" if int(n) % 5 ==
   0 else ""
       return f"value-{three}{five}"
   
   def time_format():
       def _inner(window, pane, shard_index,
   total_shards, compression, destination):
           print(window, pane, shard_index, total_shards, compression,
   destination)
           return f"dest-{destination}-shards-{shard_index}-of-{total_shards}"
       return
   _inner
   
   (p | "Create" >> beam.Create(range(N))
      | beam.Map(lambda x: str(x))
      | fileio.WriteToFiles(
   
                path="./dynamic/",
                 destination=test_15,
                 sink=fileio.TextSink(),
   
                file_naming=time_format())
   )
   
   ```
   
   
   adding shards or other variables don't help either.
   
   I have tested this in different SDKs (27, 29, 30) and Dataflow, DirectRunner, InteractiveRunner
   
   
   
   
   Imported from Jira [BEAM-12554](https://issues.apache.org/jira/browse/BEAM-12554). Original Jira may contain additional context.
   Reported by: Inigosj.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on issue #21122: WriteToFiles destination no changing condition

Posted by GitBox <gi...@apache.org>.
damccorm commented on issue #21122:
URL: https://github.com/apache/beam/issues/21122#issuecomment-1146689266

   Unable to assign user @Abacn. If able, self-assign, otherwise tag @damccorm so that he can assign you. Because of GitHub's spam prevention system, your activity is required to enable assignment in this repo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #21122: WriteToFiles destination no changing condition

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #21122:
URL: https://github.com/apache/beam/issues/21122#issuecomment-1149008339

   @damccorm Thanks. This issue is now fixed in #17708.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm closed issue #21122: WriteToFiles destination no changing condition

Posted by GitBox <gi...@apache.org>.
damccorm closed issue #21122: WriteToFiles destination no changing condition
URL: https://github.com/apache/beam/issues/21122


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org