You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/13 20:00:06 UTC

[GitHub] [airflow] quoc-t-le opened a new issue #15355: MysqlHook

quoc-t-le opened a new issue #15355:
URL: https://github.com/apache/airflow/issues/15355


   I cannot get a clean emoji sending over to the target...
   
   Converting source mysql database to utf8mb4...when I tell airflow to use utf8, the emojis are moved from source to target with '?'; when i updated to use utf8mb4, the emoji in target comes back as \ud83e\udd70;
   
   ![image](https://user-images.githubusercontent.com/5314630/114612201-d3951d80-9c6f-11eb-831d-46811106a54e.png)
   
   might need to set "use_unicode" = true for utf8mb4 ???


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-820871119


   `0xd83d 0xde03` is the UTF-16 rerpresentation of 😃, so it seems like `sql_to_gcs` is not handling surrogate pairs correctly. Either way, the MySQL hook seems to work properly (even on Python 2, that’s the correct code it should return). Please do file a bug if Airflow 2 still has the bug! (And add a link to this comment so it’s easier for others to identify the exact issue.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le closed issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le closed issue #15355:
URL: https://github.com/apache/airflow/issues/15355


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le commented on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-821324755


   I was able to get Airflow 2 running...the problem is fixed there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-820871119


   `0xd83d 0xde03` is the UTF-16 rerpresentation of 😃, so it seems like `sql_to_gcs` is not handling surrogate pairs correctly. Either way, the MySQL hook seems to work properly (even on Python 2, that’s the correct code it should return). Please do file a bug if Airflow 2 still has the bug!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le commented on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-819540989


   It says we can override use_unicode=False.  So when I go here to look at the mysqlhook, https://github.com/apache/airflow/blob/v1-10-stable/airflow/hooks/mysql_hook.py, its only set to True (I am assuming it is false?) if the charset we pass it in the extra json are utf8 / utf-8, not utf8mb4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #15355: MysqlHook

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-819014937


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le commented on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-819488297


   airflow 1.10.12
   python 2.7.13
   mysql 5.7
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-819199557


   What version of Python and Airflow are you on?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le commented on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-820710327


   mysql to gcs in format csv works fine; might be an error somewhere in here https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/operators/sql_to_gcs.py in the base when user picked json format;  looks like it was updated in the version 2...have to do a jump to two and see if it is fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-819518745


   Thanks! That makes sense, you shouldn’t get back raw bytes on Python 3. [According to the documentation](https://github.com/PyMySQL/mysqlclient/blob/master/doc/user_guide.rst#functions-and-attributes), `{"use_unicode": true}` is actually implied when you provide `charset`:
   
   > ***charset***
   > If present, the connection character set will be changed to this character set, if they are not equal. Support for changing the character set requires MySQL-4.1 and later server; if the server is too old, UnsupportedError will be raised. This option implies use_unicode=True, but you can override this with use_unicode=False, though you probably shouldn't.
   
   So the code block you mentioned above actually has no effect at all, and is probably a history relic.
   
   I’m guessing MySQL is actually returning the correct result for you (in `utf8mb4` mode; `utf8` can’t handle emojis), it’s just your terminal is not capable of rendering it (a very common issue on Python 2). So maybe you can try do something with the returned data anyway (e.g. write them into a file)? That should actually produce correct output (i.e. there is no bug!), if I’m not mistaken.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le edited a comment on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le edited a comment on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-819530967


   My dag is actually pumping out to Google Cloud Storage in a csv format using https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/operators/mysql_to_gcs.py. I was able to look at the result csv and it is not emoji when using charset utf8mb4.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le commented on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-819530967


   My dag is actually pumping out to Google Cloud Storage in a csv format...then import that into Big Query.  I was able to look at the result csv and it is not emoji when telling airflow to use charset utf8mb4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le edited a comment on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le edited a comment on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-820639775


   Alright I updated to python 3.7, airflow 1.10.15, and the task of exporting from mysql into gcs still is 
   ? --- when chartset utf8
   \ud83d\ude03 --- when charset utf8mb4
   
   expected to be 😃 <-- in the json
   
   not sure what else i can do


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le removed a comment on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le removed a comment on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-819540989


   It says we can override use_unicode=False.  So when I go here to look at the mysqlhook, https://github.com/apache/airflow/blob/v1-10-stable/airflow/hooks/mysql_hook.py, its only set to True (I am assuming it is false?) if the charset we pass it in the extra json are utf8 / utf-8, not utf8mb4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] quoc-t-le commented on issue #15355: MysqlHook Utf8mb4

Posted by GitBox <gi...@apache.org>.
quoc-t-le commented on issue #15355:
URL: https://github.com/apache/airflow/issues/15355#issuecomment-820639775


   Alright I updated to python 3.7, airflow 1.10.15, and the task of exporting from mysql into gcs still is 
   ? --- when chartset utf8
   \ud83d\ude03 --- when charset utf8mb4
   
   expected to be 😃 <-- in the csv
   
   not sure what else i can do


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org