You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/23 08:50:06 UTC

[GitHub] [airflow] zzsza commented on pull request #23695: Sql to gcs with exclude columns

zzsza commented on PR #23695:
URL: https://github.com/apache/airflow/pull/23695#issuecomment-1223759682

   @jaegwonseo 
   
   Hello :)
   I was also thinking about how to handle columns to be excluded due to privacy issues when bringing data from RDS. Thank you for making a good function.
   
   I have a question while using it.
   
   If a specific column is added to exclude_columns, it seems to be deleted only from the header while leaving the value of the column as it is. I think the value of the column should also be deleted.
   
   (한국인이시니 추가적으로 한국어로 설명하면 현재 오퍼레이터에 값을 추가하면 csv의 header엔 삭제되지만, value들은 그대로 남아있더라구요. 애초에 SELECT 하실 때 특정 컬럼만 가져오시는지 궁금하네요)
   
   
   For example, the result for MySQL data is:
   
    user_id | name | phone | created_at | updated_at | is_agree
    1001 | seongyun | 010-0000-0000 | 2022-01-23 15:01:00 | 2022-08-10 03:00:00 | True
   
   When moving these data to Cloud Storage, exclude_columns = ['name', 'phone'] works.
   
   Expected results are:
   
    user_id | created_at | updated_at | is_agree
    1001 | 2022-01-23 15:01:00 | 2022-08-10 03:00:00 | True
   
   However, if it is currently running, it will be saved as
    user_id | created_at | updated_at | is_agree
    1001 | **seongyun | 010-0000-0000** | 2022-01-23 15:01:00
   
   Maybe there is something I'm misunderstanding?
   Thank you for reading


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org