You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by airflowuser <ai...@protonmail.com.INVALID> on 2018/10/14 13:15:51 UTC

How do you branch your code with BigQuery?

I believe this is quite common case when working with data.

If something : do A
else: do B
With coding PythonBranchOperator is the solution.

But when working on Google Cloud there is no way to do this.
All existed operators are designed to continue or fail on comparison of specific value:
BigQueryValueCheckOperator  with pass_value=500 will continue if 500 return or fail in any other case. Same for all other CheckOperators. You must know the value in advanced for this to work and it's not an actual branch but more of a way to stop the workflow if an unexpected result has been found.

But how do you handle a scenario where you want to do A or B based on condition from a query result? Nothing needs to be failed. just a simple branch.

XCOM could solve it. But there is no support for XCOM yet.

https://stackoverflow.com/questions/52801318/airflow-how-to-push-xcom-value-from-bigqueryoperator

Say for example:
the query represent the number of frauds.. if it's <1000 you want to email specific users (EmailOperator) , if it's >=1000 you want to run another operator and continue the workflow.

Any thoughts on the matter will be appreciated.

Re: How do you branch your code with BigQuery?

Posted by Anthony Brown <an...@johnlewis.co.uk>.
I do intend to create a PR (when I get the chance) to get this into main
airflow repo

If anybody has any comments about this before I do, please let me know


On Mon, 15 Oct 2018 at 10:33, airflowuser
<ai...@protonmail.com.invalid> wrote:

> Awesome!
> I think this would be a fine addition to the BigQuery operators if you
> ever think about PR this to airflow master
>
> cheers
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, October 15, 2018 10:02 AM, Anthony Brown <
> anthony.brown@johnlewis.co.uk> wrote:
>
> > Hi
> > I have created a custom plugin that allows you to branch on the results
> > of a BigQuery query. The code for it is at
> >
> https://github.com/JohnLewisandPartners/custom-airflow-plugins/blob/master/bq_branch/plugins/bq_branch.py
> .
> > The version in master only works on airflow 1.10, but there is a branch
> > called airflow_1.9 that also contains the latest BigQuery hook and so
> works
> > on airflow 1.9
> >
> > The query you run must return true for all columns - the same as for the
> > BigQuery check operator, so you may need to rewrite your queries to do
> this
> >
> > On Sun, 14 Oct 2018 at 14:16, airflowuser
> > airflowuser@protonmail.com.invalid wrote:
> >
> > > I believe this is quite common case when working with data.
> > > If something : do A
> > > else: do B
> > > With coding PythonBranchOperator is the solution.
> > > But when working on Google Cloud there is no way to do this.
> > > All existed operators are designed to continue or fail on comparison of
> > > specific value:
> > > BigQueryValueCheckOperator with pass_value=500 will continue if 500
> > > return or fail in any other case. Same for all other CheckOperators.
> You
> > > must know the value in advanced for this to work and it's not an actual
> > > branch but more of a way to stop the workflow if an unexpected result
> has
> > > been found.
> > > But how do you handle a scenario where you want to do A or B based on
> > > condition from a query result? Nothing needs to be failed. just a
> simple
> > > branch.
> > > XCOM could solve it. But there is no support for XCOM yet.
> > >
> https://stackoverflow.com/questions/52801318/airflow-how-to-push-xcom-value-from-bigqueryoperator
> > > Say for example:
> > > the query represent the number of frauds.. if it's <1000 you want to
> email
> > > specific users (EmailOperator) , if it's >=1000 you want to run another
> > > operator and continue the workflow.
> > > Any thoughts on the matter will be appreciated.
> >
> > --
> >
> > --
> >
> > Anthony Brown
> > Data Engineer BI Team - John Lewis
> > Tel : 0787 215 7305
> >
> > This email is confidential and may contain copyright material of the
> John Lewis Partnership.
> > If you are not the intended recipient, please notify us immediately and
> delete all copies of this message.
> > (Please note that it is your responsibility to scan this message for
> viruses). Email to and from the
> > John Lewis Partnership is automatically monitored for operational and
> lawful business reasons.
> >
> > John Lewis plc
> > Registered in England 233462
> > Registered office 171 Victoria Street London SW1E 5NN
> >
> > Websites: https://www.johnlewis.com
> > http://www.waitrose.com
> > https://www.johnlewisfinance.com
> > http://www.johnlewispartnership.co.uk
>
>
>

-- 
-- 

Anthony Brown
Data Engineer BI Team - John Lewis
Tel : 0787 215 7305

Re: How do you branch your code with BigQuery?

Posted by airflowuser <ai...@protonmail.com.INVALID>.
Awesome!
I think this would be a fine addition to the BigQuery operators if you ever think about PR this to airflow master

cheers

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, October 15, 2018 10:02 AM, Anthony Brown <an...@johnlewis.co.uk> wrote:

> Hi
> I have created a custom plugin that allows you to branch on the results
> of a BigQuery query. The code for it is at
> https://github.com/JohnLewisandPartners/custom-airflow-plugins/blob/master/bq_branch/plugins/bq_branch.py.
> The version in master only works on airflow 1.10, but there is a branch
> called airflow_1.9 that also contains the latest BigQuery hook and so works
> on airflow 1.9
>
> The query you run must return true for all columns - the same as for the
> BigQuery check operator, so you may need to rewrite your queries to do this
>
> On Sun, 14 Oct 2018 at 14:16, airflowuser
> airflowuser@protonmail.com.invalid wrote:
>
> > I believe this is quite common case when working with data.
> > If something : do A
> > else: do B
> > With coding PythonBranchOperator is the solution.
> > But when working on Google Cloud there is no way to do this.
> > All existed operators are designed to continue or fail on comparison of
> > specific value:
> > BigQueryValueCheckOperator with pass_value=500 will continue if 500
> > return or fail in any other case. Same for all other CheckOperators. You
> > must know the value in advanced for this to work and it's not an actual
> > branch but more of a way to stop the workflow if an unexpected result has
> > been found.
> > But how do you handle a scenario where you want to do A or B based on
> > condition from a query result? Nothing needs to be failed. just a simple
> > branch.
> > XCOM could solve it. But there is no support for XCOM yet.
> > https://stackoverflow.com/questions/52801318/airflow-how-to-push-xcom-value-from-bigqueryoperator
> > Say for example:
> > the query represent the number of frauds.. if it's <1000 you want to email
> > specific users (EmailOperator) , if it's >=1000 you want to run another
> > operator and continue the workflow.
> > Any thoughts on the matter will be appreciated.
>
> --
>
> --
>
> Anthony Brown
> Data Engineer BI Team - John Lewis
> Tel : 0787 215 7305
>
> This email is confidential and may contain copyright material of the John Lewis Partnership.
> If you are not the intended recipient, please notify us immediately and delete all copies of this message.
> (Please note that it is your responsibility to scan this message for viruses). Email to and from the
> John Lewis Partnership is automatically monitored for operational and lawful business reasons.
>
> John Lewis plc
> Registered in England 233462
> Registered office 171 Victoria Street London SW1E 5NN
>
> Websites: https://www.johnlewis.com
> http://www.waitrose.com
> https://www.johnlewisfinance.com
> http://www.johnlewispartnership.co.uk



Re: How do you branch your code with BigQuery?

Posted by Anthony Brown <an...@johnlewis.co.uk>.
Hi
   I have created a custom plugin that allows you to branch on the results
of a BigQuery query. The code for it is at
https://github.com/JohnLewisandPartners/custom-airflow-plugins/blob/master/bq_branch/plugins/bq_branch.py.
The version in master only works on airflow 1.10, but there is a branch
called airflow_1.9 that also contains the latest BigQuery hook and so works
on airflow 1.9

   The query you run must return true for all columns - the same as for the
BigQuery check operator, so you may need to rewrite your queries to do this



On Sun, 14 Oct 2018 at 14:16, airflowuser
<ai...@protonmail.com.invalid> wrote:

> I believe this is quite common case when working with data.
>
> If something : do A
> else: do B
> With coding PythonBranchOperator is the solution.
>
> But when working on Google Cloud there is no way to do this.
> All existed operators are designed to continue or fail on comparison of
> specific value:
> BigQueryValueCheckOperator  with pass_value=500 will continue if 500
> return or fail in any other case. Same for all other CheckOperators. You
> must know the value in advanced for this to work and it's not an actual
> branch but more of a way to stop the workflow if an unexpected result has
> been found.
>
> But how do you handle a scenario where you want to do A or B based on
> condition from a query result? Nothing needs to be failed. just a simple
> branch.
>
> XCOM could solve it. But there is no support for XCOM yet.
>
>
> https://stackoverflow.com/questions/52801318/airflow-how-to-push-xcom-value-from-bigqueryoperator
>
> Say for example:
> the query represent the number of frauds.. if it's <1000 you want to email
> specific users (EmailOperator) , if it's >=1000 you want to run another
> operator and continue the workflow.
>
> Any thoughts on the matter will be appreciated.



-- 
-- 

Anthony Brown
Data Engineer BI Team - John Lewis
Tel : 0787 215 7305
**********************************************************************
This email is confidential and may contain copyright material of the John Lewis Partnership. 
If you are not the intended recipient, please notify us immediately and delete all copies of this message. 
(Please note that it is your responsibility to scan this message for viruses). Email to and from the
John Lewis Partnership is automatically monitored for operational and lawful business reasons.
**********************************************************************

John Lewis plc
Registered in England 233462
Registered office 171 Victoria Street London SW1E 5NN
 
Websites: https://www.johnlewis.com 
http://www.waitrose.com 
https://www.johnlewisfinance.com
http://www.johnlewispartnership.co.uk
 
**********************************************************************