You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/03/09 21:22:18 UTC

[GitHub] [beam] TheNeuralBit commented on pull request #14174: [BEAM-XXX] Port join extensions to Python

TheNeuralBit commented on pull request #14174:
URL: https://github.com/apache/beam/pull/14174#issuecomment-794471320


   Sorry I completely missed the questions you asked!
   
   > 1. How to handle Tags.
   ~~In Java, the code uses internal knowledge to create what seems to be a class identity? There seem to be some Tags being used in Python but it doesn't seem to have the same power.
   Would be great if someone could shed some light why Tags are there in the first place and how the strategy in Python is.~~
   Since this produces a simpler dict in Python, will hardcode the strings.
   
   :+1: 
   
   > 2. How to handle KV
   I have a hard time finding examples in code that use KV. Since there is a typehint I used this as far as possible in Python code. For the actual KV I then used tuples. Is this fine?
   
   Yes 2-tuples are the preferred way to represent KVs at execution time in Python.
   
   > 3. Coders
   There seems to be some Coder support but it seems like there is no equivalent to PCollection.setCoder. How is this supposed to work in Python?
   
   In Python we always infer the coder from the PCollection's element_type, which is determined from typehints. Often we just fall back to FastPrimitivesCoder. In your case it should be enough to just use `typehints.KV`.
   
   > 4. CoGbk*
   There seem to be no further util classes around CoGbk*. I assume for the most part will have to implement these.
   
   As discussed elsewhere, we generally just use python primitives rather than the CoGbk* util classes. Let me know if there are specific ones that don't seem to be supported in Python.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org