You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/07 07:52:39 UTC

[GitHub] [airflow] hsrocks opened a new issue #22037: Airflow Operator for Starting Amazon Glue DataBrew Job

hsrocks opened a new issue #22037:
URL: https://github.com/apache/airflow/issues/22037


   ### Description
   
   We can use the DataBrew integration to add data cleaning and data Normalization steps into our analytics and machine learning workflows. The operator will be used to trigger StartJobRun API [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/databrew.html#GlueDataBrew.Client.start_job_run](url) in order to start the job run. Also we will provide an option to wait for completion like we did for other available operator in case someone wants to wait for completion before triggering next task
   
   ### Use case/motivation
   
   AWS Glue DataBrew is a visual data preparation tool that enables users to clean and normalize data without writing any code. With the help of this API once the Glue DataBrew project is setup for ML or analytics engineer . This API can add value for the use case like we have to normalise or clean data before triggering Sagemaker Training or inferencing job or once the cleaned data is present we want to do validation of results using Glue or Athena
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell commented on issue #22037: Airflow Operator for Starting Amazon Glue DataBrew Job

Posted by GitBox <gi...@apache.org>.
josh-fell commented on issue #22037:
URL: https://github.com/apache/airflow/issues/22037#issuecomment-1060745717


   All yours @hsrocks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org