You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by ma...@apache.org on 2017/03/06 17:10:26 UTC
[21/22] incubator-airflow-site git commit: Latest docs version as of
1.8.x
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_sources/license.rst.txt
----------------------------------------------------------------------
diff --git a/_sources/license.rst.txt b/_sources/license.rst.txt
new file mode 100644
index 0000000..9da26c0
--- /dev/null
+++ b/_sources/license.rst.txt
@@ -0,0 +1,211 @@
+License
+=======
+
+.. image:: img/apache.jpg
+ :width: 150
+
+::
+
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright 2015 Apache Software Foundation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ Status API Training Shop Blog About
+ � 2016 GitHub, Inc. Terms Privacy Security Contact Help
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_sources/plugins.rst.txt
----------------------------------------------------------------------
diff --git a/_sources/plugins.rst.txt b/_sources/plugins.rst.txt
new file mode 100644
index 0000000..8d2078f
--- /dev/null
+++ b/_sources/plugins.rst.txt
@@ -0,0 +1,144 @@
+Plugins
+=======
+
+Airflow has a simple plugin manager built-in that can integrate external
+features to its core by simply dropping files in your
+``$AIRFLOW_HOME/plugins`` folder.
+
+The python modules in the ``plugins`` folder get imported,
+and **hooks**, **operators**, **macros**, **executors** and web **views**
+get integrated to Airflow's main collections and become available for use.
+
+What for?
+---------
+
+Airflow offers a generic toolbox for working with data. Different
+organizations have different stacks and different needs. Using Airflow
+plugins can be a way for companies to customize their Airflow installation
+to reflect their ecosystem.
+
+Plugins can be used as an easy way to write, share and activate new sets of
+features.
+
+There's also a need for a set of more complex applications to interact with
+different flavors of data and metadata.
+
+Examples:
+
+* A set of tools to parse Hive logs and expose Hive metadata (CPU /IO / phases/ skew /...)
+* An anomaly detection framework, allowing people to collect metrics, set thresholds and alerts
+* An auditing tool, helping understand who accesses what
+* A config-driven SLA monitoring tool, allowing you to set monitored tables and at what time
+ they should land, alert people, and expose visualizations of outages
+* ...
+
+Why build on top of Airflow?
+----------------------------
+
+Airflow has many components that can be reused when building an application:
+
+* A web server you can use to render your views
+* A metadata database to store your models
+* Access to your databases, and knowledge of how to connect to them
+* An array of workers that your application can push workload to
+* Airflow is deployed, you can just piggy back on it's deployment logistics
+* Basic charting capabilities, underlying libraries and abstractions
+
+
+Interface
+---------
+
+To create a plugin you will need to derive the
+``airflow.plugins_manager.AirflowPlugin`` class and reference the objects
+you want to plug into Airflow. Here's what the class you need to derive
+looks like:
+
+
+.. code:: python
+
+ class AirflowPlugin(object):
+ # The name of your plugin (str)
+ name = None
+ # A list of class(es) derived from BaseOperator
+ operators = []
+ # A list of class(es) derived from BaseHook
+ hooks = []
+ # A list of class(es) derived from BaseExecutor
+ executors = []
+ # A list of references to inject into the macros namespace
+ macros = []
+ # A list of objects created from a class derived
+ # from flask_admin.BaseView
+ admin_views = []
+ # A list of Blueprint object created from flask.Blueprint
+ flask_blueprints = []
+ # A list of menu links (flask_admin.base.MenuLink)
+ menu_links = []
+
+
+Example
+-------
+
+The code below defines a plugin that injects a set of dummy object
+definitions in Airflow.
+
+.. code:: python
+
+ # This is the class you derive to create a plugin
+ from airflow.plugins_manager import AirflowPlugin
+
+ from flask import Blueprint
+ from flask_admin import BaseView, expose
+ from flask_admin.base import MenuLink
+
+ # Importing base classes that we need to derive
+ from airflow.hooks.base_hook import BaseHook
+ from airflow.models import BaseOperator
+ from airflow.executors.base_executor import BaseExecutor
+
+ # Will show up under airflow.hooks.test_plugin.PluginHook
+ class PluginHook(BaseHook):
+ pass
+
+ # Will show up under airflow.operators.test_plugin.PluginOperator
+ class PluginOperator(BaseOperator):
+ pass
+
+ # Will show up under airflow.executors.test_plugin.PluginExecutor
+ class PluginExecutor(BaseExecutor):
+ pass
+
+ # Will show up under airflow.macros.test_plugin.plugin_macro
+ def plugin_macro():
+ pass
+
+ # Creating a flask admin BaseView
+ class TestView(BaseView):
+ @expose('/')
+ def test(self):
+ # in this example, put your test_plugin/test.html template at airflow/plugins/templates/test_plugin/test.html
+ return self.render("test_plugin/test.html", content="Hello galaxy!")
+ v = TestView(category="Test Plugin", name="Test View")
+
+ # Creating a flask blueprint to intergrate the templates and static folder
+ bp = Blueprint(
+ "test_plugin", __name__,
+ template_folder='templates', # registers airflow/plugins/templates as a Jinja template folder
+ static_folder='static',
+ static_url_path='/static/test_plugin')
+
+ ml = MenuLink(
+ category='Test Plugin',
+ name='Test Menu Link',
+ url='http://pythonhosted.org/airflow/')
+
+ # Defining the plugin class
+ class AirflowTestPlugin(AirflowPlugin):
+ name = "test_plugin"
+ operators = [PluginOperator]
+ hooks = [PluginHook]
+ executors = [PluginExecutor]
+ macros = [plugin_macro]
+ admin_views = [v]
+ flask_blueprints = [bp]
+ menu_links = [ml]
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_sources/profiling.rst.txt
----------------------------------------------------------------------
diff --git a/_sources/profiling.rst.txt b/_sources/profiling.rst.txt
new file mode 100644
index 0000000..93e6b6b
--- /dev/null
+++ b/_sources/profiling.rst.txt
@@ -0,0 +1,39 @@
+Data Profiling
+==============
+
+Part of being productive with data is having the right weapons to
+profile the data you are working with. Airflow provides a simple query
+interface to write SQL and get results quickly, and a charting application
+letting you visualize data.
+
+Adhoc Queries
+-------------
+The adhoc query UI allows for simple SQL interactions with the database
+connections registered in Airflow.
+
+.. image:: img/adhoc.png
+
+Charts
+------
+A simple UI built on top of flask-admin and highcharts allows building
+data visualizations and charts easily. Fill in a form with a label, SQL,
+chart type, pick a source database from your environment's connectons,
+select a few other options, and save it for later use.
+
+You can even use the same templating and macros available when writing
+airflow pipelines, parameterizing your queries and modifying parameters
+directly in the URL.
+
+These charts are basic, but they're easy to create, modify and share.
+
+Chart Screenshot
+................
+
+.. image:: img/chart.png
+
+-----
+
+Chart Form Screenshot
+.....................
+
+.. image:: img/chart_form.png
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_sources/project.rst.txt
----------------------------------------------------------------------
diff --git a/_sources/project.rst.txt b/_sources/project.rst.txt
new file mode 100644
index 0000000..2fbd516
--- /dev/null
+++ b/_sources/project.rst.txt
@@ -0,0 +1,49 @@
+Project
+=======
+
+History
+-------
+
+Airflow was started in October 2014 by Maxime Beauchemin at Airbnb.
+It was open source from the very first commit and officially brought under
+the Airbnb Github and announced in June 2015.
+
+The project joined the Apache Software Foundation's incubation program in March 2016.
+
+
+Committers
+----------
+
+- @mistercrunch (Maxime "Max" Beauchemin)
+- @r39132 (Siddharth "Sid" Anand)
+- @criccomini (Chris Riccomini)
+- @bolkedebruin (Bolke de Bruin)
+- @artwr (Arthur Wiedmer)
+- @jlowin (Jeremiah Lowin)
+- @patrickleotardif (Patrick Leo Tardif)
+- @aoen (Dan Davydov)
+- @syvineckruyk (Steven Yvinec-Kruyk)
+
+For the full list of contributors, take a look at `Airflow's Github
+Contributor page:
+<https://github.com/apache/incubator-airflow/graphs/contributors>`_
+
+
+Resources & links
+-----------------
+
+* `Airflow's official documentation <http://airflow.apache.org/>`_
+* Mailing list (send emails to
+ ``dev-subscribe@airflow.incubator.apache.org`` and/or
+ ``commits-subscribe@airflow.incubator.apache.org``
+ to subscribe to each)
+* `Issues on Apache's Jira <https://issues.apache.org/jira/browse/AIRFLOW>`_
+* `Gitter (chat) Channel <https://gitter.im/airbnb/airflow>`_
+* `More resources and links to Airflow related content on the Wiki <https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links>`_
+
+
+
+Roadmap
+-------
+
+Please refer to the Roadmap on `the wiki <https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Home>`_
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_sources/scheduler.rst.txt
----------------------------------------------------------------------
diff --git a/_sources/scheduler.rst.txt b/_sources/scheduler.rst.txt
new file mode 100644
index 0000000..749d58a
--- /dev/null
+++ b/_sources/scheduler.rst.txt
@@ -0,0 +1,153 @@
+Scheduling & Triggers
+=====================
+
+The Airflow scheduler monitors all tasks and all DAGs, and triggers the
+task instances whose dependencies have been met. Behind the scenes,
+it monitors and stays in sync with a folder for all DAG objects it may contain,
+and periodically (every minute or so) inspects active tasks to see whether
+they can be triggered.
+
+The Airflow scheduler is designed to run as a persistent service in an
+Airflow production environment. To kick it off, all you need to do is
+execute ``airflow scheduler``. It will use the configuration specified in
+``airflow.cfg``.
+
+Note that if you run a DAG on a ``schedule_interval`` of one day,
+the run stamped ``2016-01-01`` will be trigger soon after ``2016-01-01T23:59``.
+In other words, the job instance is started once the period it covers
+has ended.
+
+**Let's Repeat That** The scheduler runs your job one ``schedule_interval`` AFTER the
+start date, at the END of the period.
+
+The scheduler starts an instance of the executor specified in the your
+``airflow.cfg``. If it happens to be the ``LocalExecutor``, tasks will be
+executed as subprocesses; in the case of ``CeleryExecutor`` and
+``MesosExecutor``, tasks are executed remotely.
+
+To start a scheduler, simply run the command:
+
+.. code:: bash
+
+ airflow scheduler
+
+
+DAG Runs
+''''''''
+
+A DAG Run is an object representing an instantiation of the DAG in time.
+
+Each DAG may or may not have a schedule, which informs how ``DAG Runs`` are
+created. ``schedule_interval`` is defined as a DAG arguments, and receives
+preferably a
+`cron expression <https://en.wikipedia.org/wiki/Cron#CRON_expression>`_ as
+a ``str``, or a ``datetime.timedelta`` object. Alternatively, you can also
+use one of these cron "preset":
+
++--------------+----------------------------------------------------------------+---------------+
+| preset | Run once a year at midnight of January 1 | cron |
++==============+================================================================+===============+
+| ``None`` | Don't schedule, use for exclusively "externally triggered" | |
+| | DAGs | |
++--------------+----------------------------------------------------------------+---------------+
+| ``@once`` | Schedule once and only once | |
++--------------+----------------------------------------------------------------+---------------+
+| ``@hourly`` | Run once an hour at the beginning of the hour | ``0 * * * *`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@daily`` | Run once a day at midnight | ``0 0 * * *`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@weekly`` | Run once a week at midnight on Sunday morning | ``0 0 * * 0`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@monthly`` | Run once a month at midnight of the first day of the month | ``0 0 1 * *`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@yearly`` | Run once a year at midnight of January 1 | ``0 0 1 1 *`` |
++--------------+----------------------------------------------------------------+---------------+
+
+
+Your DAG will be instantiated
+for each schedule, while creating a ``DAG Run`` entry for each schedule.
+
+DAG runs have a state associated to them (running, failed, success) and
+informs the scheduler on which set of schedules should be evaluated for
+task submissions. Without the metadata at the DAG run level, the Airflow
+scheduler would have much more work to do in order to figure out what tasks
+should be triggered and come to a crawl. It might also create undesired
+processing when changing the shape of your DAG, by say adding in new
+tasks.
+
+Backfill and Catchup
+''''''''''''''''''''
+
+An Airflow DAG with a ``start_date``, possibly an ``end_date``, and a ``schedule_interval`` defines a
+series of intervals which the scheduler turn into individual Dag Runs and execute. A key capability of
+Airflow is that these DAG Runs are atomic, idempotent items, and the scheduler, by default, will examine
+the lifetime of the DAG (from start to end/now, one interval at a time) and kick off a DAG Run for any
+interval that has not been run (or has been cleared). This concept is called Catchup.
+
+If your DAG is written to handle it's own catchup (IE not limited to the interval, but instead to "Now"
+for instance.), then you will want to turn catchup off (Either on the DAG itself with ``dag.catchup =
+False``) or by default at the configuration file level with ``catchup_by_default = False``. What this
+will do, is to instruct the scheduler to only create a DAG Run for the most current instance of the DAG
+interval series.
+
+.. code:: python
+ """
+ Code that goes along with the Airflow tutorial located at:
+ https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py
+ """
+ from airflow import DAG
+ from airflow.operators.bash_operator import BashOperator
+ from datetime import datetime, timedelta
+
+
+ default_args = {
+ 'owner': 'airflow',
+ 'depends_on_past': False,
+ 'start_date': datetime(2015, 12, 1),
+ 'email': ['airflow@airflow.com'],
+ 'email_on_failure': False,
+ 'email_on_retry': False,
+ 'retries': 1,
+ 'retry_delay': timedelta(minutes=5),
+ 'schedule_interval': '@hourly',
+ }
+
+ dag = DAG('tutorial', catchup=False, default_args=default_args)
+
+In the example above, if the DAG is picked up by the scheduler daemon on 2016-01-02 at 6 AM, (or from the
+command line), a single DAG Run will be created, with an ``execution_date`` of 2016-01-01, and the next
+one will be created just after midnight on the morning of 2016-01-03 with an execution date of 2016-01-02.
+
+If the ``dag.catchup`` value had been True instead, the scheduler would have created a DAG Run for each
+completed interval between 2015-12-01 and 2016-01-02 (but not yet one for 2016-01-02, as that interval
+hasn't completed) and the scheduler will execute them sequentially. This behavior is great for atomic
+datasets that can easily be split into periods. Turning catchup off is great if your DAG Runs perform
+backfill internally.
+
+External Triggers
+'''''''''''''''''
+
+Note that ``DAG Runs`` can also be created manually through the CLI while
+running an ``airflow trigger_dag`` command, where you can define a
+specific ``run_id``. The ``DAG Runs`` created externally to the
+scheduler get associated to the trigger's timestamp, and will be displayed
+in the UI alongside scheduled ``DAG runs``.
+
+
+To Keep in Mind
+'''''''''''''''
+* The first ``DAG Run`` is created based on the minimum ``start_date`` for the
+ tasks in your DAG.
+* Subsequent ``DAG Runs`` are created by the scheduler process, based on
+ your DAG's ``schedule_interval``, sequentially.
+* When clearing a set of tasks' state in hope of getting them to re-run,
+ it is important to keep in mind the ``DAG Run``'s state too as it defines
+ whether the scheduler should look into triggering tasks for that run.
+
+Here are some of the ways you can **unblock tasks**:
+
+* From the UI, you can **clear** (as in delete the status of) individual task instances from the task instances dialog, while defining whether you want to includes the past/future and the upstream/downstream dependencies. Note that a confirmation window comes next and allows you to see the set you are about to clear.
+* The CLI command ``airflow clear -h`` has lots of options when it comes to clearing task instance states, including specifying date ranges, targeting task_ids by specifying a regular expression, flags for including upstream and downstream relatives, and targeting task instances in specific states (``failed``, or ``success``)
+* Marking task instances as successful can be done through the UI. This is mostly to fix false negatives, or for instance when the fix has been applied outside of Airflow.
+* The ``airflow backfill`` CLI subcommand has a flag to ``--mark_success`` and allows selecting subsections of the DAG as well as specifying date ranges.
+
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_sources/security.rst.txt
----------------------------------------------------------------------
diff --git a/_sources/security.rst.txt b/_sources/security.rst.txt
new file mode 100644
index 0000000..70db606
--- /dev/null
+++ b/_sources/security.rst.txt
@@ -0,0 +1,334 @@
+Security
+========
+
+By default, all gates are opened. An easy way to restrict access
+to the web application is to do it at the network level, or by using
+SSH tunnels.
+
+It is however possible to switch on authentication by either using one of the supplied
+backends or create your own.
+
+Web Authentication
+------------------
+
+Password
+''''''''
+
+One of the simplest mechanisms for authentication is requiring users to specify a password before logging in.
+Password authentication requires the used of the ``password`` subpackage in your requirements file. Password hashing
+uses bcrypt before storing passwords.
+
+.. code-block:: bash
+
+ [webserver]
+ authenticate = True
+ auth_backend = airflow.contrib.auth.backends.password_auth
+
+When password auth is enabled, an initial user credential will need to be created before anyone can login. An initial
+user was not created in the migrations for this authenication backend to prevent default Airflow installations from
+attack. Creating a new user has to be done via a Python REPL on the same machine Airflow is installed.
+
+.. code-block:: bash
+
+ # navigate to the airflow installation directory
+ $ cd ~/airflow
+ $ python
+ Python 2.7.9 (default, Feb 10 2015, 03:28:08)
+ Type "help", "copyright", "credits" or "license" for more information.
+ >>> import airflow
+ >>> from airflow import models, settings
+ >>> from airflow.contrib.auth.backends.password_auth import PasswordUser
+ >>> user = PasswordUser(models.User())
+ >>> user.username = 'new_user_name'
+ >>> user.email = 'new_user_email@example.com'
+ >>> user.password = 'set_the_password'
+ >>> session = settings.Session()
+ >>> session.add(user)
+ >>> session.commit()
+ >>> session.close()
+ >>> exit()
+
+LDAP
+''''
+
+To turn on LDAP authentication configure your ``airflow.cfg`` as follows. Please note that the example uses
+an encrypted connection to the ldap server as you probably do not want passwords be readable on the network level.
+It is however possible to configure without encryption if you really want to.
+
+Additionally, if you are using Active Directory, and are not explicitly specifying an OU that your users are in,
+you will need to change ``search_scope`` to "SUBTREE".
+
+Valid search_scope options can be found in the `ldap3 Documentation <http://ldap3.readthedocs.org/searches.html?highlight=search_scope>`_
+
+.. code-block:: bash
+
+ [webserver]
+ authenticate = True
+ auth_backend = airflow.contrib.auth.backends.ldap_auth
+
+ [ldap]
+ # set a connection without encryption: uri = ldap://<your.ldap.server>:<port>
+ uri = ldaps://<your.ldap.server>:<port>
+ user_filter = objectClass=*
+ # in case of Active Directory you would use: user_name_attr = sAMAccountName
+ user_name_attr = uid
+ superuser_filter = memberOf=CN=airflow-super-users,OU=Groups,OU=RWC,OU=US,OU=NORAM,DC=example,DC=com
+ data_profiler_filter = memberOf=CN=airflow-data-profilers,OU=Groups,OU=RWC,OU=US,OU=NORAM,DC=example,DC=com
+ bind_user = cn=Manager,dc=example,dc=com
+ bind_password = insecure
+ basedn = dc=example,dc=com
+ cacert = /etc/ca/ldap_ca.crt
+ # Set search_scope to one of them: BASE, LEVEL , SUBTREE
+ # Set search_scope to SUBTREE if using Active Directory, and not specifying an Organizational Unit
+ search_scope = LEVEL
+
+The superuser_filter and data_profiler_filter are optional. If defined, these configurations allow you to specify LDAP groups that users must belong to in order to have superuser (admin) and data-profiler permissions. If undefined, all users will be superusers and data profilers.
+
+Roll your own
+'''''''''''''
+
+Airflow uses ``flask_login`` and
+exposes a set of hooks in the ``airflow.default_login`` module. You can
+alter the content and make it part of the ``PYTHONPATH`` and configure it as a backend in ``airflow.cfg```.
+
+.. code-block:: bash
+
+ [webserver]
+ authenticate = True
+ auth_backend = mypackage.auth
+
+Multi-tenancy
+-------------
+
+You can filter the list of dags in webserver by owner name, when authentication
+is turned on, by setting webserver.filter_by_owner as true in your ``airflow.cfg``
+With this, when a user authenticates and logs into webserver, it will see only the dags
+which it is owner of. A super_user, will be able to see all the dags although.
+This makes the web UI a multi-tenant UI, where a user will only be able to see dags
+created by itself.
+
+
+Kerberos
+--------
+
+Airflow has initial support for Kerberos. This means that airflow can renew kerberos
+tickets for itself and store it in the ticket cache. The hooks and dags can make use of ticket
+to authenticate against kerberized services.
+
+Limitations
+'''''''''''
+
+Please note that at this time not all hooks have been adjusted to make use of this functionality yet.
+Also it does not integrate kerberos into the web interface and you will have to rely on network
+level security for now to make sure your service remains secure.
+
+Celery integration has not been tried and tested yet. However if you generate a key tab for every host
+and launch a ticket renewer next to every worker it will most likely work.
+
+Enabling kerberos
+'''''''''''''''''
+
+#### Airflow
+
+To enable kerberos you will need to generate a (service) key tab.
+
+.. code-block:: bash
+
+ # in the kadmin.local or kadmin shell, create the airflow principal
+ kadmin: addprinc -randkey airflow/fully.qualified.domain.name@YOUR-REALM.COM
+
+ # Create the airflow keytab file that will contain the airflow principal
+ kadmin: xst -norandkey -k airflow.keytab airflow/fully.qualified.domain.name
+
+Now store this file in a location where the airflow user can read it (chmod 600). And then add the following to
+your ``airflow.cfg``
+
+.. code-block:: bash
+
+ [core]
+ security = kerberos
+
+ [kerberos]
+ keytab = /etc/airflow/airflow.keytab
+ reinit_frequency = 3600
+ principal = airflow
+
+Launch the ticket renewer by
+
+.. code-block:: bash
+
+ # run ticket renewer
+ airflow kerberos
+
+#### Hadoop
+
+If want to use impersonation this needs to be enabled in ``core-site.xml`` of your hadoop config.
+
+.. code-block:: bash
+
+ <property>
+ <name>hadoop.proxyuser.airflow.groups</name>
+ <value>*</value>
+ </property>
+
+ <property>
+ <name>hadoop.proxyuser.airflow.users</name>
+ <value>*</value>
+ </property>
+
+ <property>
+ <name>hadoop.proxyuser.airflow.hosts</name>
+ <value>*</value>
+ </property>
+
+Of course if you need to tighten your security replace the asterisk with something more appropriate.
+
+Using kerberos authentication
+'''''''''''''''''''''''''''''
+
+The hive hook has been updated to take advantage of kerberos authentication. To allow your DAGs to use it simply
+update the connection details with, for example:
+
+.. code-block:: bash
+
+ { "use_beeline": true, "principal": "hive/_HOST@EXAMPLE.COM"}
+
+Adjust the principal to your settings. The _HOST part will be replaced by the fully qualified domain name of
+the server.
+
+You can specify if you would like to use the dag owner as the user for the connection or the user specified in the login
+section of the connection. For the login user specify the following as extra:
+
+.. code-block:: bash
+
+ { "use_beeline": true, "principal": "hive/_HOST@EXAMPLE.COM", "proxy_user": "login"}
+
+For the DAG owner use:
+
+.. code-block:: bash
+
+ { "use_beeline": true, "principal": "hive/_HOST@EXAMPLE.COM", "proxy_user": "owner"}
+
+and in your DAG, when initializing the HiveOperator, specify
+
+.. code-block:: bash
+
+ run_as_owner=True
+
+OAuth Authentication
+--------------------
+
+GitHub Enterprise (GHE) Authentication
+''''''''''''''''''''''''''''''''''''''
+
+The GitHub Enterprise authentication backend can be used to authenticate users
+against an installation of GitHub Enterprise using OAuth2. You can optionally
+specify a team whitelist (composed of slug cased team names) to restrict login
+to only members of those teams.
+
+*NOTE* If you do not specify a team whitelist, anyone with a valid account on
+your GHE installation will be able to login to Airflow.
+
+.. code-block:: bash
+
+ [webserver]
+ authenticate = True
+ auth_backend = airflow.contrib.auth.backends.github_enterprise_auth
+
+ [github_enterprise]
+ host = github.example.com
+ client_id = oauth_key_from_github_enterprise
+ client_secret = oauth_secret_from_github_enterprise
+ oauth_callback_route = /example/ghe_oauth/callback
+ allowed_teams = 1, 345, 23
+
+Setting up GHE Authentication
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+An application must be setup in GHE before you can use the GHE authentication
+backend. In order to setup an application:
+
+1. Navigate to your GHE profile
+2. Select 'Applications' from the left hand nav
+3. Select the 'Developer Applications' tab
+4. Click 'Register new application'
+5. Fill in the required information (the 'Authorization callback URL' must be fully qualifed e.g. http://airflow.example.com/example/ghe_oauth/callback)
+6. Click 'Register application'
+7. Copy 'Client ID', 'Client Secret', and your callback route to your airflow.cfg according to the above example
+
+Google Authentication
+'''''''''''''''''''''
+
+The Google authentication backend can be used to authenticate users
+against Google using OAuth2. You must specify a domain to restrict login
+to only members of that domain.
+
+.. code-block:: bash
+
+ [webserver]
+ authenticate = True
+ auth_backend = airflow.contrib.auth.backends.google_auth
+
+ [google]
+ client_id = google_client_id
+ client_secret = google_client_secret
+ oauth_callback_route = /oauth2callback
+ domain = example.com
+
+Setting up Google Authentication
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+An application must be setup in the Google API Console before you can use the Google authentication
+backend. In order to setup an application:
+
+1. Navigate to https://console.developers.google.com/apis/
+2. Select 'Credentials' from the left hand nav
+3. Click 'Create credentials' and choose 'OAuth client ID'
+4. Choose 'Web application'
+5. Fill in the required information (the 'Authorized redirect URIs' must be fully qualifed e.g. http://airflow.example.com/oauth2callback)
+6. Click 'Create'
+7. Copy 'Client ID', 'Client Secret', and your redirect URI to your airflow.cfg according to the above example
+
+SSL
+---
+
+SSL can be enabled by providing a certificate and key. Once enabled, be sure to use
+"https://" in your browser.
+
+.. code-block:: bash
+
+ [webserver]
+ web_server_ssl_cert = <path to cert>
+ web_server_ssl_key = <path to key>
+
+Enabling SSL will not automatically change the web server port. If you want to use the
+standard port 443, you'll need to configure that too. Be aware that super user privileges
+(or cap_net_bind_service on Linux) are required to listen on port 443.
+
+.. code-block:: bash
+
+ # Optionally, set the server to listen on the standard SSL port.
+ web_server_port = 443
+ base_url = http://<hostname or IP>:443
+
+Impersonation
+'''''''''''''
+
+Airflow has the ability to impersonate a unix user while running task
+instances based on the task's ``run_as_user`` parameter, which takes a user's name.
+
+*NOTE* For impersonations to work, Airflow must be run with `sudo` as subtasks are run
+with `sudo -u` and permissions of files are changed. Furthermore, the unix user needs to
+exist on the worker. Here is what a simple sudoers file entry could look like to achieve
+this, assuming as airflow is running as the `airflow` user. Note that this means that
+the airflow user must be trusted and treated the same way as the root user.
+
+.. code-block:: none
+ airflow ALL=(ALL) NOPASSWD: ALL
+
+Subtasks with impersonation will still log to the same folder, except that the files they
+log to will have permissions changed such that only the unix user can write to it.
+
+*Default impersonation* To prevent tasks that don't use impersonation to be run with
+`sudo` privileges, you can set the `default_impersonation` config in `core` which sets a
+default user impersonate if `run_as_user` is not set.
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_sources/start.rst.txt
----------------------------------------------------------------------
diff --git a/_sources/start.rst.txt b/_sources/start.rst.txt
new file mode 100644
index 0000000..cc41d4b
--- /dev/null
+++ b/_sources/start.rst.txt
@@ -0,0 +1,49 @@
+Quick Start
+-----------
+
+The installation is quick and straightforward.
+
+.. code-block:: bash
+
+ # airflow needs a home, ~/airflow is the default,
+ # but you can lay foundation somewhere else if you prefer
+ # (optional)
+ export AIRFLOW_HOME=~/airflow
+
+ # install from pypi using pip
+ pip install airflow
+
+ # initialize the database
+ airflow initdb
+
+ # start the web server, default port is 8080
+ airflow webserver -p 8080
+
+Upon running these commands, Airflow will create the ``$AIRFLOW_HOME`` folder
+and lay an "airflow.cfg" file with defaults that get you going fast. You can
+inspect the file either in ``$AIRFLOW_HOME/airflow.cfg``, or through the UI in
+the ``Admin->Configuration`` menu. The PID file for the webserver will be stored
+in ``$AIRFLOW_HOME/airflow-webserver.pid`` or in ``/run/airflow/webserver.pid``
+if started by systemd.
+
+Out of the box, Airflow uses a sqlite database, which you should outgrow
+fairly quickly since no parallelization is possible using this database
+backend. It works in conjunction with the ``SequentialExecutor`` which will
+only run task instances sequentially. While this is very limiting, it allows
+you to get up and running quickly and take a tour of the UI and the
+command line utilities.
+
+Here are a few commands that will trigger a few task instances. You should
+be able to see the status of the jobs change in the ``example1`` DAG as you
+run the commands below.
+
+.. code-block:: bash
+
+ # run your first task instance
+ airflow run example_bash_operator runme_0 2015-01-01
+ # run a backfill over 2 days
+ airflow backfill example_bash_operator -s 2015-01-01 -e 2015-01-02
+
+What's Next?
+''''''''''''
+From this point, you can head to the :doc:`tutorial` section for further examples or the :doc:`configuration` section if you're ready to get your hands dirty.
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_sources/tutorial.rst.txt
----------------------------------------------------------------------
diff --git a/_sources/tutorial.rst.txt b/_sources/tutorial.rst.txt
new file mode 100644
index 0000000..97bbe11
--- /dev/null
+++ b/_sources/tutorial.rst.txt
@@ -0,0 +1,429 @@
+
+Tutorial
+================
+
+This tutorial walks you through some of the fundamental Airflow concepts,
+objects, and their usage while writing your first pipeline.
+
+Example Pipeline definition
+---------------------------
+
+Here is an example of a basic pipeline definition. Do not worry if this looks
+complicated, a line by line explanation follows below.
+
+.. code:: python
+
+ """
+ Code that goes along with the Airflow tutorial located at:
+ https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py
+ """
+ from airflow import DAG
+ from airflow.operators.bash_operator import BashOperator
+ from datetime import datetime, timedelta
+
+
+ default_args = {
+ 'owner': 'airflow',
+ 'depends_on_past': False,
+ 'start_date': datetime(2015, 6, 1),
+ 'email': ['airflow@airflow.com'],
+ 'email_on_failure': False,
+ 'email_on_retry': False,
+ 'retries': 1,
+ 'retry_delay': timedelta(minutes=5),
+ # 'queue': 'bash_queue',
+ # 'pool': 'backfill',
+ # 'priority_weight': 10,
+ # 'end_date': datetime(2016, 1, 1),
+ }
+
+ dag = DAG('tutorial', default_args=default_args)
+
+ # t1, t2 and t3 are examples of tasks created by instantiating operators
+ t1 = BashOperator(
+ task_id='print_date',
+ bash_command='date',
+ dag=dag)
+
+ t2 = BashOperator(
+ task_id='sleep',
+ bash_command='sleep 5',
+ retries=3,
+ dag=dag)
+
+ templated_command = """
+ {% for i in range(5) %}
+ echo "{{ ds }}"
+ echo "{{ macros.ds_add(ds, 7)}}"
+ echo "{{ params.my_param }}"
+ {% endfor %}
+ """
+
+ t3 = BashOperator(
+ task_id='templated',
+ bash_command=templated_command,
+ params={'my_param': 'Parameter I passed in'},
+ dag=dag)
+
+ t2.set_upstream(t1)
+ t3.set_upstream(t1)
+
+
+It's a DAG definition file
+--------------------------
+
+One thing to wrap your head around (it may not be very intuitive for everyone
+at first) is that this Airflow Python script is really
+just a configuration file specifying the DAG's structure as code.
+The actual tasks defined here will run in a different context from
+the context of this script. Different tasks run on different workers
+at different points in time, which means that this script cannot be used
+to cross communicate between tasks. Note that for this
+purpose we have a more advanced feature called ``XCom``.
+
+People sometimes think of the DAG definition file as a place where they
+can do some actual data processing - that is not the case at all!
+The script's purpose is to define a DAG object. It needs to evaluate
+quickly (seconds, not minutes) since the scheduler will execute it
+periodically to reflect the changes if any.
+
+
+Importing Modules
+-----------------
+
+An Airflow pipeline is just a Python script that happens to define an
+Airflow DAG object. Let's start by importing the libraries we will need.
+
+.. code:: python
+
+ # The DAG object; we'll need this to instantiate a DAG
+ from airflow import DAG
+
+ # Operators; we need this to operate!
+ from airflow.operators.bash_operator import BashOperator
+
+Default Arguments
+-----------------
+We're about to create a DAG and some tasks, and we have the choice to
+explicitly pass a set of arguments to each task's constructor
+(which would become redundant), or (better!) we can define a dictionary
+of default parameters that we can use when creating tasks.
+
+.. code:: python
+
+ from datetime import datetime, timedelta
+
+ default_args = {
+ 'owner': 'airflow',
+ 'depends_on_past': False,
+ 'start_date': datetime(2015, 6, 1),
+ 'email': ['airflow@airflow.com'],
+ 'email_on_failure': False,
+ 'email_on_retry': False,
+ 'retries': 1,
+ 'retry_delay': timedelta(minutes=5),
+ # 'queue': 'bash_queue',
+ # 'pool': 'backfill',
+ # 'priority_weight': 10,
+ # 'end_date': datetime(2016, 1, 1),
+ }
+
+For more information about the BaseOperator's parameters and what they do,
+refer to the :py:class:``airflow.models.BaseOperator`` documentation.
+
+Also, note that you could easily define different sets of arguments that
+would serve different purposes. An example of that would be to have
+different settings between a production and development environment.
+
+
+Instantiate a DAG
+-----------------
+
+We'll need a DAG object to nest our tasks into. Here we pass a string
+that defines the ``dag_id``, which serves as a unique identifier for your DAG.
+We also pass the default argument dictionary that we just defined and
+define a ``schedule_interval`` of 1 day for the DAG.
+
+.. code:: python
+
+ dag = DAG(
+ 'tutorial', default_args=default_args, schedule_interval=timedelta(1))
+
+Tasks
+-----
+Tasks are generated when instantiating operator objects. An object
+instantiated from an operator is called a constructor. The first argument
+``task_id`` acts as a unique identifier for the task.
+
+.. code:: python
+
+ t1 = BashOperator(
+ task_id='print_date',
+ bash_command='date',
+ dag=dag)
+
+ t2 = BashOperator(
+ task_id='sleep',
+ bash_command='sleep 5',
+ retries=3,
+ dag=dag)
+
+Notice how we pass a mix of operator specific arguments (``bash_command``) and
+an argument common to all operators (``retries``) inherited
+from BaseOperator to the operator's constructor. This is simpler than
+passing every argument for every constructor call. Also, notice that in
+the second task we override the ``retries`` parameter with ``3``.
+
+The precedence rules for a task are as follows:
+
+1. Explicitly passed arguments
+2. Values that exist in the ``default_args`` dictionary
+3. The operator's default value, if one exists
+
+A task must include or inherit the arguments ``task_id`` and ``owner``,
+otherwise Airflow will raise an exception.
+
+Templating with Jinja
+---------------------
+Airflow leverages the power of
+`Jinja Templating <http://jinja.pocoo.org/docs/dev/>`_ and provides
+the pipeline author
+with a set of built-in parameters and macros. Airflow also provides
+hooks for the pipeline author to define their own parameters, macros and
+templates.
+
+This tutorial barely scratches the surface of what you can do with
+templating in Airflow, but the goal of this section is to let you know
+this feature exists, get you familiar with double curly brackets, and
+point to the most common template variable: ``{{ ds }}``.
+
+.. code:: python
+
+ templated_command = """
+ {% for i in range(5) %}
+ echo "{{ ds }}"
+ echo "{{ macros.ds_add(ds, 7) }}"
+ echo "{{ params.my_param }}"
+ {% endfor %}
+ """
+
+ t3 = BashOperator(
+ task_id='templated',
+ bash_command=templated_command,
+ params={'my_param': 'Parameter I passed in'},
+ dag=dag)
+
+Notice that the ``templated_command`` contains code logic in ``{% %}`` blocks,
+references parameters like ``{{ ds }}``, calls a function as in
+``{{ macros.ds_add(ds, 7)}}``, and references a user-defined parameter
+in ``{{ params.my_param }}``.
+
+The ``params`` hook in ``BaseOperator`` allows you to pass a dictionary of
+parameters and/or objects to your templates. Please take the time
+to understand how the parameter ``my_param`` makes it through to the template.
+
+Files can also be passed to the ``bash_command`` argument, like
+``bash_command='templated_command.sh'``, where the file location is relative to
+the directory containing the pipeline file (``tutorial.py`` in this case). This
+may be desirable for many reasons, like separating your script's logic and
+pipeline code, allowing for proper code highlighting in files composed in
+different languages, and general flexibility in structuring pipelines. It is
+also possible to define your ``template_searchpath`` as pointing to any folder
+locations in the DAG constructor call.
+
+For more information on the variables and macros that can be referenced
+in templates, make sure to read through the :ref:`macros` section
+
+Setting up Dependencies
+-----------------------
+We have two simple tasks that do not depend on each other. Here's a few ways
+you can define dependencies between them:
+
+.. code:: python
+
+ t2.set_upstream(t1)
+
+ # This means that t2 will depend on t1
+ # running successfully to run
+ # It is equivalent to
+ # t1.set_downstream(t2)
+
+ t3.set_upstream(t1)
+
+ # all of this is equivalent to
+ # dag.set_dependency('print_date', 'sleep')
+ # dag.set_dependency('print_date', 'templated')
+
+Note that when executing your script, Airflow will raise exceptions when
+it finds cycles in your DAG or when a dependency is referenced more
+than once.
+
+Recap
+-----
+Alright, so we have a pretty basic DAG. At this point your code should look
+something like this:
+
+.. code:: python
+
+ """
+ Code that goes along with the Airflow located at:
+ http://airflow.readthedocs.org/en/latest/tutorial.html
+ """
+ from airflow import DAG
+ from airflow.operators.bash_operator import BashOperator
+ from datetime import datetime, timedelta
+
+
+ default_args = {
+ 'owner': 'airflow',
+ 'depends_on_past': False,
+ 'start_date': datetime(2015, 6, 1),
+ 'email': ['airflow@airflow.com'],
+ 'email_on_failure': False,
+ 'email_on_retry': False,
+ 'retries': 1,
+ 'retry_delay': timedelta(minutes=5),
+ # 'queue': 'bash_queue',
+ # 'pool': 'backfill',
+ # 'priority_weight': 10,
+ # 'end_date': datetime(2016, 1, 1),
+ }
+
+ dag = DAG(
+ 'tutorial', default_args=default_args, schedule_interval=timedelta(1))
+
+ # t1, t2 and t3 are examples of tasks created by instantiating operators
+ t1 = BashOperator(
+ task_id='print_date',
+ bash_command='date',
+ dag=dag)
+
+ t2 = BashOperator(
+ task_id='sleep',
+ bash_command='sleep 5',
+ retries=3,
+ dag=dag)
+
+ templated_command = """
+ {% for i in range(5) %}
+ echo "{{ ds }}"
+ echo "{{ macros.ds_add(ds, 7)}}"
+ echo "{{ params.my_param }}"
+ {% endfor %}
+ """
+
+ t3 = BashOperator(
+ task_id='templated',
+ bash_command=templated_command,
+ params={'my_param': 'Parameter I passed in'},
+ dag=dag)
+
+ t2.set_upstream(t1)
+ t3.set_upstream(t1)
+
+Testing
+--------
+
+Running the Script
+''''''''''''''''''
+
+Time to run some tests. First let's make sure that the pipeline
+parses. Let's assume we're saving the code from the previous step in
+``tutorial.py`` in the DAGs folder referenced in your ``airflow.cfg``.
+The default location for your DAGs is ``~/airflow/dags``.
+
+.. code-block:: bash
+
+ python ~/airflow/dags/tutorial.py
+
+If the script does not raise an exception it means that you haven't done
+anything horribly wrong, and that your Airflow environment is somewhat
+sound.
+
+Command Line Metadata Validation
+'''''''''''''''''''''''''''''''''
+Let's run a few commands to validate this script further.
+
+.. code-block:: bash
+
+ # print the list of active DAGs
+ airflow list_dags
+
+ # prints the list of tasks the "tutorial" dag_id
+ airflow list_tasks tutorial
+
+ # prints the hierarchy of tasks in the tutorial DAG
+ airflow list_tasks tutorial --tree
+
+
+Testing
+'''''''
+Let's test by running the actual task instances on a specific date. The
+date specified in this context is an ``execution_date``, which simulates the
+scheduler running your task or dag at a specific date + time:
+
+.. code-block:: bash
+
+ # command layout: command subcommand dag_id task_id date
+
+ # testing print_date
+ airflow test tutorial print_date 2015-06-01
+
+ # testing sleep
+ airflow test tutorial sleep 2015-06-01
+
+Now remember what we did with templating earlier? See how this template
+gets rendered and executed by running this command:
+
+.. code-block:: bash
+
+ # testing templated
+ airflow test tutorial templated 2015-06-01
+
+This should result in displaying a verbose log of events and ultimately
+running your bash command and printing the result.
+
+Note that the ``airflow test`` command runs task instances locally, outputs
+their log to stdout (on screen), doesn't bother with dependencies, and
+doesn't communicate state (running, success, failed, ...) to the database.
+It simply allows testing a single task instance.
+
+Backfill
+''''''''
+Everything looks like it's running fine so let's run a backfill.
+``backfill`` will respect your dependencies, emit logs into files and talk to
+the database to record status. If you do have a webserver up, you'll be able
+to track the progress. ``airflow webserver`` will start a web server if you
+are interested in tracking the progress visually as your backfill progresses.
+
+Note that if you use ``depends_on_past=True``, individual task instances
+will depend on the success of the preceding task instance, except for the
+start_date specified itself, for which this dependency is disregarded.
+
+The date range in this context is a ``start_date`` and optionally an ``end_date``,
+which are used to populate the run schedule with task instances from this dag.
+
+.. code-block:: bash
+
+ # optional, start a web server in debug mode in the background
+ # airflow webserver --debug &
+
+ # start your backfill on a date range
+ airflow backfill tutorial -s 2015-06-01 -e 2015-06-07
+
+What's Next?
+-------------
+That's it, you've written, tested and backfilled your very first Airflow
+pipeline. Merging your code into a code repository that has a master scheduler
+running against it should get it to get triggered and run every day.
+
+Here's a few things you might want to do next:
+
+* Take an in-depth tour of the UI - click all the things!
+* Keep reading the docs! Especially the sections on:
+
+ * Command line interface
+ * Operators
+ * Macros
+
+* Write your first pipeline!
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_sources/ui.rst.txt
----------------------------------------------------------------------
diff --git a/_sources/ui.rst.txt b/_sources/ui.rst.txt
new file mode 100644
index 0000000..4b232fa
--- /dev/null
+++ b/_sources/ui.rst.txt
@@ -0,0 +1,102 @@
+UI / Screenshots
+=================
+The Airflow UI make it easy to monitor and troubleshoot your data pipelines.
+Here's a quick overview of some of the features and visualizations you
+can find in the Airflow UI.
+
+
+DAGs View
+.........
+List of the DAGs in your environment, and a set of shortcuts to useful pages.
+You can see exactly how many tasks succeeded, failed, or are currently
+running at a glance.
+
+------------
+
+.. image:: img/dags.png
+
+------------
+
+
+Tree View
+.........
+A tree representation of the DAG that spans across time. If a pipeline is
+late, you can quickly see where the different steps are and identify
+the blocking ones.
+
+------------
+
+.. image:: img/tree.png
+
+------------
+
+Graph View
+..........
+The graph view is perhaps the most comprehensive. Visualize your DAG's
+dependencies and their current status for a specific run.
+
+------------
+
+.. image:: img/graph.png
+
+------------
+
+Variable View
+.............
+The variable view allows you to list, create, edit or delete the key-value pair
+of a variable used during jobs. Value of a variable will be hidden if the key contains
+any words in ('password', 'secret', 'passwd', 'authorization', 'api_key', 'apikey', 'access_token')
+by default, but can be configured to show in clear-text.
+
+------------
+
+.. image:: img/variable_hidden.png
+
+------------
+
+Gantt Chart
+...........
+The Gantt chart lets you analyse task duration and overlap. You can quickly
+identify bottlenecks and where the bulk of the time is spent for specific
+DAG runs.
+
+------------
+
+.. image:: img/gantt.png
+
+------------
+
+Task Duration
+.............
+The duration of your different tasks over the past N runs. This view lets
+you find outliers and quickly understand where the time is spent in your
+DAG over many runs.
+
+
+------------
+
+.. image:: img/duration.png
+
+------------
+
+Code View
+.........
+Transparency is everything. While the code for your pipeline is in source
+control, this is a quick way to get to the code that generates the DAG and
+provide yet more context.
+
+------------
+
+.. image:: img/code.png
+
+------------
+
+Task Instance Context Menu
+..........................
+From the pages seen above (tree view, graph view, gantt, ...), it is always
+possible to click on a task instance, and get to this rich context menu
+that can take you to more detailed metadata, and perform some actions.
+
+------------
+
+.. image:: img/context.png
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/5e574012/_static/fonts/Inconsolata.ttf
----------------------------------------------------------------------
diff --git a/_static/fonts/Inconsolata.ttf b/_static/fonts/Inconsolata.ttf
new file mode 100644
index 0000000..4b8a36d
Binary files /dev/null and b/_static/fonts/Inconsolata.ttf differ