You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/10/22 14:17:17 UTC

[GitHub] [pulsar] baynes opened a new issue #8344: Pulsar Identity SerDe behaviour and documentation

baynes opened a new issue #8344:
URL: https://github.com/apache/pulsar/issues/8344


   If you look at http://pulsar.apache.org/docs/en/functions-develop/#serde under the Python tab it says
   
   "In Python, the default SerDe is identity, meaning that the type is serialized as whatever type the producer function returns."
   "You can use the IdentitySerde, which leaves the data unchanged. The IdentitySerDe is the default."
   
   This strongly gives the impression that the default `IdentitySerDe1 does not change the message in any way. This is not the case -- it will attempt to convert incoming bytes to one of float, int, string and only leaves it as bytes when they fail. This can result in unexpected conversions (we have had binary data unexpectedly converted to string).
   
   It also attempts the reverse on the function result. Fortunately this does not result in unexpected behaviour, though does lead to muddled/sloppy programming as people are careless with the type of return value.
   
   There are options as how to correct this:
   
   1: Fix the code so the `IdentitySerDe` is just that - it leaves the bytes unchanged. One could then have a `Paddington Bear SerDe` (well intentioned and helpful but tends to get things wrong) which does what the existing `IdentitySerDe` does and also have `FloatSerDe`, `IntSerDe` and `StringSerDe` to cover the other cases reliably.
   
   2: Change the documentation on the `IdentitySerDe` to explain what it really does and its dangers but leave it as the default. Introduce `FloatSerDe`, `IntSerDe`,`StringSerDe` and `BytesSerDe` to cover the cases reliably.
   
   3: Fix the code so the `IdentitySerDe` is just that - it leaves the bytes unchanged. Also have `FloatSerDe`, `IntSerDe` and `StringSerDe` to cover the other cases reliably. Make `StringSerDe` the default on the guess this is the most common use case.
   
   My preference would be option 1, but I suspect the installed code base would need option 2.  3 is a sort of compromise.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #8344: Pulsar Identity SerDe behaviour and documentation

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #8344:
URL: https://github.com/apache/pulsar/issues/8344#issuecomment-714646904


   Option 1 should be preferred.  Are you interested in contributing a bug fix?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] baynes commented on issue #8344: Pulsar Identity SerDe behaviour and documentation

Posted by GitBox <gi...@apache.org>.
baynes commented on issue #8344:
URL: https://github.com/apache/pulsar/issues/8344#issuecomment-737224118


   > 
   > 
   > Option 1 should be preferred. Are you interested in contributing a bug fix?
   
   Maybe. But It won't be for until after Christmas before I have any time to try. I am happy for someone else to take it on if they want to.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org