Amit Joshi
28 August: Amit Joshi
The goal of this blog is to answer these two questions
Airflow is a platform to programmatically author, schedule and monitor workflows and data pipelines.
Use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.
When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.
Airflow is a platform that lets you build and run workflows. A workflow is represented as a DAG (a Directed Acyclic Graph), and contains individual pieces of work called Tasks, arranged with dependencies and data flows taken into account.
A DAG specifies the dependencies between Tasks, and the order in which to execute them and run retries; the Tasks themselves describe what to do, be it fetching data, running analysis, triggering other systems, or more.
An Airflow installation generally consists of the following components:
DAGs are designed to be run many times, and multiple runs of them can happen in parallel. DAGs are parameterized, always including a date they are "running for" (the execution_date), but with other optional parameters as well.
Tasks have dependencies declared on each other. You'll see this in a DAG either using the >> and << operators:
first_task >> [second_task, third_task] third_task << fourth_task
Or, with the set_upstream and set_downstream methods:
first_task.set_downstream([second_task, third_task]) third_task.set_upstream(fourth_task)
Airflow comes with a user interface that lets you see what DAGs and their tasks are doing, trigger runs of DAGs, view logs, and do some limited debugging and resolution of problems with your DAGs.
It's generally the best way to see the status of your Airflow installation as a whole, as well as diving into individual DAGs to see their layout, the status of each task, and the logs from each task.
28 August: Amit Joshi
The goal of this blog is to answer these two questions
Airflow is a platform to programmatically author, schedule and monitor workflows and data pipelines.
Use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.
When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.
Airflow is a platform that lets you build and run workflows. A workflow is represented as a DAG (a Directed Acyclic Graph), and contains individual pieces of work called Tasks, arranged with dependencies and data flows taken into account.
A DAG specifies the dependencies between Tasks, and the order in which to execute them and run retries; the Tasks themselves describe what to do, be it fetching data, running analysis, triggering other systems, or more.
An Airflow installation generally consists of the following components:
DAGs are designed to be run many times, and multiple runs of them can happen in parallel. DAGs are parameterized, always including a date they are "running for" (the execution_date), but with other optional parameters as well.
Tasks have dependencies declared on each other. You'll see this in a DAG either using the >> and << operators:
first_task >> [second_task, third_task] third_task << fourth_task
Or, with the set_upstream and set_downstream methods:
first_task.set_downstream([second_task, third_task]) third_task.set_upstream(fourth_task)
Airflow comes with a user interface that lets you see what DAGs and their tasks are doing, trigger runs of DAGs, view logs, and do some limited debugging and resolution of problems with your DAGs.
It's generally the best way to see the status of your Airflow installation as a whole, as well as diving into individual DAGs to see their layout, the status of each task, and the logs from each task.