Airflow and automated OpsGenie alerts

Ole Christian Langfjæran
3 min readNov 10, 2020

--

Are you looking to automatically create and close alerts when tasks fail in Airflow? Read on, and I’ll show you a little trick .

So, at Unacast we have been using OpsGenie for alerting system for a while now. We formerly used PagerDuty, but saw an option to shave off a few extra $ by switching. It turned out to be a pretty easy switch, and they are quite similar in user experience and integrations. So, I’ll quickly show how we both create, and close our alerts in OpsGenie from Airflow

Creating OpsGenie alerts from Airflow

This one is pretty straight forward, and Airflow even has Hook for connecting to OpsGenie. To set it up in Airflow, add an Airflow http connection, using a key you’ve created under “…opsgenie.com/settings/api-key-management” as your password

Next is to start using it in your Operators. It is basically adding a function reference in the on_failure_callback, like so

https://gist.github.com/judoole/90eb55a5ce7c49621eb1e38dcc85da5f

Where the simplest implementation of the OpsGenieExceptionReportecould be something like this

https://gist.github.com/judoole/ee48814e4964755ed140b494ef9ffd32

I’ll show later how this can be made even nicer, but for now, let’s skip to how we can automatically close alerts. And btw, pay attention to line 11 where we set the alias for the alert. We will reuse this when closing the alert.

Automatically closing alerts

For automatically closing alerts, let’s hook on to the on_success_callback and create a class for closing alerts. It could look like this

https://gist.github.com/judoole/b7fe98c888088679cf4470fa28aba667

See here that we reuse the alias, and utilise the OpsGenieAlertHook And we only close an alert if there is a second run, as we then guess that the previous run was a failure. And OpsGenie doesn’t seem to mind if we try to close an alert that does not exist, or is already closed. This approach also means that if we mark a task as success it will unfortunately not close the alert. It might be that the if task_instance.try_number > 1 is not needed, but we felt that it was best not to test OpsGenie for thresholds here.

So there you have it. Easy? Good! Let’s look if we can clean up the message bit also

Jinja templating the message

It would be nice if we could easily customise parts of these alerts. So what better than to use jinja? We already have the context at hand, so let’s try

https://gist.github.com/judoole/a5c7d43db777ff8b1d2be80b55fcc757

So here we now have a default description and message, which you easily can override per task, and use jinja templating. And the code to use it is minimal. Worth mentioning is that this default message is geared towards Slack markdown. This is because at Unacast, we use Slack for our OpsGenie messages. Highly recommend it.

Would be very happy to hear if you use OpsGenie in some similar way, or if you have a nice solution to also close alerts on “mark as success”. Or if you just liked the post!

--

--

Ole Christian Langfjæran

Senior platform engineer at Unacast. Part time everything else.