You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Niels Zeilemaker <ni...@zeilemaker.nl> on 2017/10/23 07:30:29 UTC

Sensors

Hi Guys,

I've created a Sensor which is monitoring the number of files in an
Azure Blobstore. If the number of files increases, then I would like
to trigger another dag. This is more or less similar to the
example_trigger_controller_dag.py and example_trigger_target_dag.py
setup.

However, after triggering the target DAG I would want my controller
DAG to start monitoring the Blobstore again. But since the schedule of
the controller DAG is set to None, it doesn't continue monitoring. I
"fixed" this by adding a TriggerDAG which schedules a new run of the
Controller DAG. But this feels a bit like a hack.

Does someone have any experience which such a continuous monitoring
sensor? Or know of a better way to achieve this?

Thanks,
Niels

Re: Sensors

Posted by Bolke de Bruin <bd...@gmail.com>.
You could of course secure the endpoint with Nginx and use some form of basic auth or even oauth. 

B.

Verstuurd vanaf mijn iPad

> Op 23 okt. 2017 om 19:20 heeft Niels Zeilemaker <ni...@zeilemaker.nl> het volgende geschreven:
> 
> Unfortunately, we are not using kerberos, hence we cannot use the rest
> api...
> 
> I'll have a look if I can implement http basic auth. That's probably the
> best option. Similar to Grant I'm not to happy with the very long running
> sensor job.
> 
> Niels
> 
> 
> Op 23 okt. 2017 7:10 p.m. schreef "Bolke de Bruin" <bd...@gmail.com>:
> 
> I think you can do something like Azure functions blob storage binding and
> let that kick off a dag by triggering it from the Rest API:
> 
> https://docs.microsoft.com/en-us/azure/azure-functions/
> functions-bindings-storage-blob <https://docs.microsoft.com/
> en-us/azure/azure-functions/functions-bindings-storage-blob>
> 
> I don’t use Azure so it might not fit your case.
> 
> Bolke
> 
>> On 23 Oct 2017, at 16:15, Grant Nicholas <grantnicholas2015@u.
> northwestern.edu> wrote:
>> 
>> It sounds like you want a background daemon that continuously monitors the
>> status of some external system and triggers things on a condition. This
>> does not sound like an ETL job, and thus airflow is not a great fit for
>> this type of problem. That said, there are workarounds like you mentioned.
>> One easy workaround if you can handle a delay between `condition happens
> ->
>> dag triggers` is setting your controller dag to have a recurring schedule
>> (ie: not None). Then when that controlling dag is triggered, you just
>> perform your sensor check once and then trigger/don't trigger another dag
>> depending on the condition. The thing I'd be worried about with your
>> `trigger dagrun` approach is if the trigger dagrun operator fails for any
>> reason you'll stop monitoring the external system, while with the
> scheduled
>> approach you don't have to worry about the failure modes of retrying
> failed
>> dags/etc.
>> 
>> On Mon, Oct 23, 2017 at 2:30 AM, Niels Zeilemaker <ni...@zeilemaker.nl>
>> wrote:
>> 
>>> Hi Guys,
>>> 
>>> I've created a Sensor which is monitoring the number of files in an
>>> Azure Blobstore. If the number of files increases, then I would like
>>> to trigger another dag. This is more or less similar to the
>>> example_trigger_controller_dag.py and example_trigger_target_dag.py
>>> setup.
>>> 
>>> However, after triggering the target DAG I would want my controller
>>> DAG to start monitoring the Blobstore again. But since the schedule of
>>> the controller DAG is set to None, it doesn't continue monitoring. I
>>> "fixed" this by adding a TriggerDAG which schedules a new run of the
>>> Controller DAG. But this feels a bit like a hack.
>>> 
>>> Does someone have any experience which such a continuous monitoring
>>> sensor? Or know of a better way to achieve this?
>>> 
>>> Thanks,
>>> Niels
>>> 

Re: Sensors

Posted by Niels Zeilemaker <ni...@zeilemaker.nl>.
Unfortunately, we are not using kerberos, hence we cannot use the rest
api...

I'll have a look if I can implement http basic auth. That's probably the
best option. Similar to Grant I'm not to happy with the very long running
sensor job.

Niels


Op 23 okt. 2017 7:10 p.m. schreef "Bolke de Bruin" <bd...@gmail.com>:

I think you can do something like Azure functions blob storage binding and
let that kick off a dag by triggering it from the Rest API:

https://docs.microsoft.com/en-us/azure/azure-functions/
functions-bindings-storage-blob <https://docs.microsoft.com/
en-us/azure/azure-functions/functions-bindings-storage-blob>

I don’t use Azure so it might not fit your case.

Bolke

> On 23 Oct 2017, at 16:15, Grant Nicholas <grantnicholas2015@u.
northwestern.edu> wrote:
>
> It sounds like you want a background daemon that continuously monitors the
> status of some external system and triggers things on a condition. This
> does not sound like an ETL job, and thus airflow is not a great fit for
> this type of problem. That said, there are workarounds like you mentioned.
> One easy workaround if you can handle a delay between `condition happens
->
> dag triggers` is setting your controller dag to have a recurring schedule
> (ie: not None). Then when that controlling dag is triggered, you just
> perform your sensor check once and then trigger/don't trigger another dag
> depending on the condition. The thing I'd be worried about with your
> `trigger dagrun` approach is if the trigger dagrun operator fails for any
> reason you'll stop monitoring the external system, while with the
scheduled
> approach you don't have to worry about the failure modes of retrying
failed
> dags/etc.
>
> On Mon, Oct 23, 2017 at 2:30 AM, Niels Zeilemaker <ni...@zeilemaker.nl>
> wrote:
>
>> Hi Guys,
>>
>> I've created a Sensor which is monitoring the number of files in an
>> Azure Blobstore. If the number of files increases, then I would like
>> to trigger another dag. This is more or less similar to the
>> example_trigger_controller_dag.py and example_trigger_target_dag.py
>> setup.
>>
>> However, after triggering the target DAG I would want my controller
>> DAG to start monitoring the Blobstore again. But since the schedule of
>> the controller DAG is set to None, it doesn't continue monitoring. I
>> "fixed" this by adding a TriggerDAG which schedules a new run of the
>> Controller DAG. But this feels a bit like a hack.
>>
>> Does someone have any experience which such a continuous monitoring
>> sensor? Or know of a better way to achieve this?
>>
>> Thanks,
>> Niels
>>

Re: Sensors

Posted by Bolke de Bruin <bd...@gmail.com>.
I think you can do something like Azure functions blob storage binding and let that kick off a dag by triggering it from the Rest API:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob <https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob>

I don’t use Azure so it might not fit your case.

Bolke

> On 23 Oct 2017, at 16:15, Grant Nicholas <gr...@u.northwestern.edu> wrote:
> 
> It sounds like you want a background daemon that continuously monitors the
> status of some external system and triggers things on a condition. This
> does not sound like an ETL job, and thus airflow is not a great fit for
> this type of problem. That said, there are workarounds like you mentioned.
> One easy workaround if you can handle a delay between `condition happens ->
> dag triggers` is setting your controller dag to have a recurring schedule
> (ie: not None). Then when that controlling dag is triggered, you just
> perform your sensor check once and then trigger/don't trigger another dag
> depending on the condition. The thing I'd be worried about with your
> `trigger dagrun` approach is if the trigger dagrun operator fails for any
> reason you'll stop monitoring the external system, while with the scheduled
> approach you don't have to worry about the failure modes of retrying failed
> dags/etc.
> 
> On Mon, Oct 23, 2017 at 2:30 AM, Niels Zeilemaker <ni...@zeilemaker.nl>
> wrote:
> 
>> Hi Guys,
>> 
>> I've created a Sensor which is monitoring the number of files in an
>> Azure Blobstore. If the number of files increases, then I would like
>> to trigger another dag. This is more or less similar to the
>> example_trigger_controller_dag.py and example_trigger_target_dag.py
>> setup.
>> 
>> However, after triggering the target DAG I would want my controller
>> DAG to start monitoring the Blobstore again. But since the schedule of
>> the controller DAG is set to None, it doesn't continue monitoring. I
>> "fixed" this by adding a TriggerDAG which schedules a new run of the
>> Controller DAG. But this feels a bit like a hack.
>> 
>> Does someone have any experience which such a continuous monitoring
>> sensor? Or know of a better way to achieve this?
>> 
>> Thanks,
>> Niels
>> 


Re: Sensors

Posted by Grant Nicholas <gr...@u.northwestern.edu>.
It sounds like you want a background daemon that continuously monitors the
status of some external system and triggers things on a condition. This
does not sound like an ETL job, and thus airflow is not a great fit for
this type of problem. That said, there are workarounds like you mentioned.
One easy workaround if you can handle a delay between `condition happens ->
dag triggers` is setting your controller dag to have a recurring schedule
(ie: not None). Then when that controlling dag is triggered, you just
perform your sensor check once and then trigger/don't trigger another dag
depending on the condition. The thing I'd be worried about with your
`trigger dagrun` approach is if the trigger dagrun operator fails for any
reason you'll stop monitoring the external system, while with the scheduled
approach you don't have to worry about the failure modes of retrying failed
dags/etc.

On Mon, Oct 23, 2017 at 2:30 AM, Niels Zeilemaker <ni...@zeilemaker.nl>
wrote:

> Hi Guys,
>
> I've created a Sensor which is monitoring the number of files in an
> Azure Blobstore. If the number of files increases, then I would like
> to trigger another dag. This is more or less similar to the
> example_trigger_controller_dag.py and example_trigger_target_dag.py
> setup.
>
> However, after triggering the target DAG I would want my controller
> DAG to start monitoring the Blobstore again. But since the schedule of
> the controller DAG is set to None, it doesn't continue monitoring. I
> "fixed" this by adding a TriggerDAG which schedules a new run of the
> Controller DAG. But this feels a bit like a hack.
>
> Does someone have any experience which such a continuous monitoring
> sensor? Or know of a better way to achieve this?
>
> Thanks,
> Niels
>