You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/11 13:32:18 UTC

[GitHub] [airflow] eskarimov commented on a change in pull request #17446: Add support of `path` parameter for GCloud Storage Transfer Service operators

eskarimov commented on a change in pull request #17446:
URL: https://github.com/apache/airflow/pull/17446#discussion_r686834461



##########
File path: airflow/providers/google/cloud/hooks/gcs.py
##########
@@ -1207,3 +1204,7 @@ def _parse_gcs_url(gsurl: str) -> Tuple[str, str]:
     # Remove leading '/' but NOT trailing one
     blob = parsed_url.path.lstrip('/')
     return bucket, blob
+
+
+def _normalize_directory_path(source_object: Optional[str]) -> Optional[str]:
+    return source_object + "/" if source_object and not source_object.endswith("/") else source_object

Review comment:
       Hi @uranusjr ,
   thank you for reviewing, I'd like to share why I think it'd be better to keep the function in its current place:
   - If we move it under `airflow.utils` it might be treated as a general function for normalizing directory path (for instance removing redundant separators or converting slashes to backslashes for Windows, etc), while here it's purely about having a slash at the end of the path.
   
   - Another options is moving it under `airflow.providers.google.cloud.utils`, but it seems less logical to me, because the function is related with normalizing GCS path, I'd expect to find this function under the class/file related with GCS together with other functions.
   
   Regarding the underscore prefix: I understand and share your point to get rid of the leading underscore, as we import this function in other modules, so it doesn't seem like something for internal use.
   On another side I've tried to follow the same pattern as the previous function in the same file [`_parse_gcs_url`](https://github.com/apache/airflow/blob/8d2d264cdd7bb79b85b2b6124c7fc596ac9142f8/airflow/providers/google/cloud/hooks/gcs.py#L1192-L1206) - it's used across [BigQuery ](https://github.com/apache/airflow/blob/67cbb0f181f806edb16ca12fb7a2638b5f31eb58/airflow/providers/google/cloud/operators/bigquery.py#L39), [AzureFileShareToGCSOperator ](https://github.com/apache/airflow/blob/866a601b76e219b3c043e1dbbc8fb22300866351/airflow/providers/google/cloud/transfers/azure_fileshare_to_gcs.py#L24), [S3ToGCSOperator](https://github.com/apache/airflow/blob/866a601b76e219b3c043e1dbbc8fb22300866351/airflow/providers/google/cloud/transfers/s3_to_gcs.py#L25) operators. Please let me know your thoughts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org