I even have been working with Airflow for greater than three years now and overall, I’m quite confident with it. It’s a robust orchestrator that helps me construct data pipelines quickly and in a scalable fashion while for many things I’m trying to implement it comes with batteries included.
Recently, and while preparing myself to get a certification for Airflow, I’ve come across many various things I had literally no clue about. And this was essentially my motivation to jot down this text and share with you a number of Airflow internals which have totally blown my mind!
1. Scheduler only parses files containing certain keywords
The Airflow Scheduler will parse only files containing airflow
or dag
within the code! Yes, you’ve heard this right! If a file under the DAG folder doesn’t contain at the very least considered one of these two keywords, it can simply not be parsed by the scheduler.
If you ought to modify this rule such that this isn’t any longer a requirement for the scheduler, you possibly can simply set DAG_DISCOVERY_SAFE_MODE
configuration setting to False
. In that case, the scheduler will parse all files under your DAG folder (/dags
).
I wouldn’t recommend disabling this check though, since doing so doesn’t really make any sense. A correct DAG file can have Airflow imports and DAG definition which implies the necessities for parsing that file are met) nevertheless it is price knowing that this rule exists.
2. Variables with certain keywords of their name have their values hidden
We all know that by default, Airflow will hide sensitive information stored in a Connection (and more specifically within the password
field), but what about Variables?
Well, that is indeed possible and the mind blowing thing is that Airflow can do that routinely for you. If a variable incorporates certain keywords, that may possibly indicate sensitive information, then its value will routinely be hidden.
Here’s a listing of keywords that may make a Variable qualify for having sensitive information store as…