You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openwhisk.apache.org by GitBox <gi...@apache.org> on 2018/08/15 18:28:33 UTC

[GitHub] abaruni commented on issue #277: Should decode not encode UTF-8 messages?

abaruni commented on issue #277: Should decode not encode UTF-8 messages?
URL: https://github.com/apache/incubator-openwhisk-package-kafka/issues/277#issuecomment-413291004
 
 
   @ScottChapman 
   
   we run 
   
   https://github.com/apache/incubator-openwhisk-package-kafka/blob/master/provider/consumer.py#L455
   
   merely to ensure that the data is valid unicode. the motivation behind this is that in the past we have received corrupted data from Message Hub and that message is passed as part of the payload to the request which itself attempts to encode the incoming data as part of the `json` module. In fact, `encode` runs an implicit `decode` prior to attempting to actually encode. likewise `decode` runs an implicit `encode` prior to attempting to actually decode.
   
   ```
   >>> '\xb6'.encode('utf-8')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   UnicodeDecodeError: 'ascii' codec can't decode byte 0xb6 in position 0: ordinal not in range(128)
   >>> u'\xb6'.decode('utf-8')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
       return codecs.utf_8_decode(input, errors, True)
   UnicodeEncodeError: 'ascii' codec can't encode character u'\xb6' in position 0: ordinal not in range(128)
   ```
   
   As you can see  the call to `'\xb6'.encode('utf-8')` results in a Unicode**Decode**Error and the call to u'\xb6'.decode('utf-8') results in a Unicode**Encode**Error
   
   But the ultimate point of running value.encode('utf-8') is merely to ensure that we are working with valid unicode before passing it down to other modules such as `json` and `requests` as those modules will surface errors if we don't verify beforehand
   
   The data therefore is arriving corrupt and is not being corrupted by the use of this function call 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services