You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/07 23:31:51 UTC

[GitHub] [airflow] fuxiao224 opened a new issue, #22846: UUID encoded in CassandraToGCSOperator but not other operators

fuxiao224 opened a new issue, #22846:
URL: https://github.com/apache/airflow/issues/22846

   ### Apache Airflow version
   
   2.2.5 (latest released)
   
   ### What happened
   
   I noticed that UUID is encoded in CassandraToGCSOperator by:
   
   `elif isinstance(value, UUID):
               return b64encode(value.bytes).decode('ascii')`
   
   Therefore, for example, UUID 000e0000-5719-12a3-0000-000028327d4a is represented as "AA4AAFcZEqMAAAAAKDJ9Sg==". However, this seems inconsistent with other *TOGCSOperators. For example, UUID in MySQL/oracle is represented as  a UTF8 string of five hexadecimal numbers in format "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", so in the previous example, UUID 000e0000-5719-12a3-0000-000028327d4a would still be represented as "UUID 000e0000-5719-12a3-0000-000028327d4a". Therefore, when using MySQLToGCSOperator/OracleToGCSOperator, UUID will preserve as "UUID 000e0000-5719-12a3-0000-000028327d4a" format, which is not encoded.
   
   Thus, I wonder what is the main concern of encoding UUID in CassandraToGCSOperator, and if possible, can we change it to not encoding UUID when loading Cassandra table to GCS using Airflow? Please let me know your thoughts about this issue. Thanks!
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   macOS
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #22846:
URL: https://github.com/apache/airflow/issues/22846#issuecomment-1139274133

   Fixed in https://github.com/apache/airflow/pull/23766


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal closed issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
eladkal closed issue #22846: UUID encoded in CassandraToGCSOperator but not other operators
URL: https://github.com/apache/airflow/issues/22846


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #22846:
URL: https://github.com/apache/airflow/issues/22846#issuecomment-1094001588

   According to #3483, this was done because 
   
   > issue with UUID type conversion: currently UUID is converted to hex string, but should be converted to base64-encoded as that is the required format in BigQuery for uploading.
   
   If this description is taken at face value, the format you proposed would not work? I am not familiar with Google services to provide more information, unfortunately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #22846:
URL: https://github.com/apache/airflow/issues/22846#issuecomment-1098532233

   Yeah. Adding parameter willl be a good solution. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #22846:
URL: https://github.com/apache/airflow/issues/22846#issuecomment-1095537023

   You can’t simply revert it since that would introduce a backward incompatibility and break existing usages. Perhaps it’s possible to add a flag on the operator to toggle the format used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] fuxiao224 commented on issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
fuxiao224 commented on issue #22846:
URL: https://github.com/apache/airflow/issues/22846#issuecomment-1103128805

   Thanks for your suggestions! I agree that adding a parameter to let user choose from whether to encode UUID sounds a good plan. I'll create the PR asap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #22846:
URL: https://github.com/apache/airflow/issues/22846#issuecomment-1092300050

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] fuxiao224 commented on issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
fuxiao224 commented on issue #22846:
URL: https://github.com/apache/airflow/issues/22846#issuecomment-1095355541

   Thanks! @uranusjr 
   This PR was opened 4 years ago, I'm not sure if this was a BigQuery rule at that time, but I don't think base64-encoded UUID format is a requirement for BigQuery at this point. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] fuxiao224 commented on issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
fuxiao224 commented on issue #22846:
URL: https://github.com/apache/airflow/issues/22846#issuecomment-1095357689

   I wonder if there's any other concerns if I revert #3483 CassandraToGCSOperator part to convert UUID to hex string, instead of base64-encoded format? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #22846: UUID encoded in CassandraToGCSOperator but not other operators

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #22846:
URL: https://github.com/apache/airflow/issues/22846#issuecomment-1095537819

   cc @jgao54 in case you know some more details on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org