You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by bo...@apache.org on 2017/03/13 04:44:59 UTC

[01/45] incubator-airflow git commit: CHANGELOG for 1.8

Repository: incubator-airflow
Updated Branches:
  refs/heads/v1-8-stable 07d40d7cd -> f4760c320


CHANGELOG for 1.8

Closes #2000 from alexvanboxel/pr/changelog


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8dc27c67
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8dc27c67
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8dc27c67

Branch: refs/heads/v1-8-stable
Commit: 8dc27c675a2651e8d4e20f40d9b0a50c7ba5a832
Parents: 7e65998
Author: Alex Van Boxel <al...@vanboxel.be>
Authored: Thu Feb 2 19:40:04 2017 +0100
Committer: Alex Van Boxel <al...@vanboxel.be>
Committed: Thu Feb 2 20:05:09 2017 +0100

----------------------------------------------------------------------
 CHANGELOG.txt | 345 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 345 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8dc27c67/CHANGELOG.txt
----------------------------------------------------------------------
diff --git a/CHANGELOG.txt b/CHANGELOG.txt
index 2cb2f20..8da887c 100644
--- a/CHANGELOG.txt
+++ b/CHANGELOG.txt
@@ -1,3 +1,348 @@
+AIRFLOW 1.8.0, 2017-02-02
+-------------------------
+
+[AIRFLOW-816] Use static nvd3 and d3
+[AIRFLOW-821] Fix py3 compatibility
+[AIRFLOW-817] Check for None value of execution_date in endpoint
+[AIRFLOW-822] Close db before exception
+[AIRFLOW-815] Add prev/next execution dates to template variables
+[AIRFLOW-813] Fix unterminated unit tests in SchedulerJobTest
+[AIRFLOW-813] Fix unterminated scheduler unit tests
+[AIRFLOW-806] UI should properly ignore DAG doc when it is None
+[AIRFLOW-812] Fix the scheduler termination bug.
+[AIRFLOW-780] Fix dag import errors no longer working
+[AIRFLOW-783] Fix py3 incompatibility in BaseTaskRunner
+[AIRFLOW-810] Correct down_revision dag_id/state index creation
+[AIRFLOW-807] Improve scheduler performance for large DAGs
+[AIRFLOW-798] Check return_code before forcing termination
+[AIRFLOW-139] Let psycopg2 handle autocommit for PostgresHook
+[AIRFLOW-776] Add missing cgroups devel dependency
+[AIRFLOW-777] Fix expression to check if a DagRun is in running state
+[AIRFLOW-785] Don't import CgroupTaskRunner at global scope
+[AIRFLOW-784] Pin funcsigs to 1.0.0
+[AIRFLOW-624] Fix setup.py to not import airflow.version as version
+[AIRFLOW-779] Task should fail with specific message when deleted
+[AIRFLOW-778] Fix completey broken MetastorePartitionSensor
+[AIRFLOW-739] Set pickle_info log to debug
+[AIRFLOW-771] Make S3 logs append instead of clobber
+[AIRFLOW-773] Fix flaky datetime addition in api test
+[AIRFLOW-219][AIRFLOW-398] Cgroups + impersonation
+[AIRFLOW-683] Add jira hook, operator and sensor
+[AIRFLOW-762] Add Google DataProc delete operator
+[AIRFLOW-760] Update systemd config
+[AIRFLOW-759] Use previous dag_run to verify depend_on_past
+[AIRFLOW-757] Set child_process_log_directory default more sensible
+[AIRFLOW-692] Open XCom page to super-admins only
+[AIRFLOW-737] Fix HDFS Sensor directory.
+[AIRFLOW-747] Fix retry_delay not honoured
+[AIRFLOW-558] Add Support for dag.catchup=(True|False) Option
+[AIRFLOW-489] Allow specifying execution date in trigger_dag API
+[AIRFLOW-738] Commit deleted xcom items before insert
+[AIRFLOW-729] Add Google Cloud Dataproc cluster creation operator
+[AIRFLOW-728] Add Google BigQuery table sensor
+[AIRFLOW-741] Log to debug instead of info for app.py
+[AIRFLOW-731] Fix period bug for NamedHivePartitionSensor
+[AIRFLOW-740] Pin jinja2 to < 2.9.0
+[AIRFLOW-663] Improve time units for task performance charts
+[AIRFLOW-665] Fix email attachments
+[AIRFLOW-734] Fix SMTP auth regression when not using user/pass
+[AIRFLOW-702] Fix LDAP Regex Bug
+[AIRFLOW-717] Add Cloud Storage updated sensor
+[AIRFLOW-695] Retries do not execute because dagrun is in FAILED state
+[AIRFLOW-673] Add operational metrics test for SchedulerJob
+[AIRFLOW-727] try_number is not increased
+[AIRFLOW-715] A more efficient HDFS Sensor:
+[AIRFLOW-716] Allow AVRO BigQuery load-job without schema
+[AIRFLOW-718] Allow the query URI for DataProc Pig
+Log needs to be part of try/catch block
+[AIRFLOW-721] Descendant process can disappear before termination
+[AIRFLOW-403] Bash operator's kill method leaves underlying processes running
+[AIRFLOW-657] Add AutoCommit Parameter for MSSQL
+[AIRFLOW-641] Improve pull request instructions
+[AIRFLOW-685] Add test for MySqlHook.bulk_load()
+[AIRFLOW-686] Match auth backend config section
+[AIRFLOW-691] Add SSH KeepAlive option to SSH_hook
+[AIRFLOW-709] Use same engine for migrations and reflection
+[AIRFLOW-700] Update to reference to web authentication documentation
+[AIRFLOW-649] Support non-sched DAGs in LatestOnlyOp
+[AIRFLOW-712] Fix AIRFLOW-667 to use proper HTTP error properties
+[AIRFLOW-710] Add OneFineStay as official user
+[AIRFLOW-703][AIRFLOW-1] Stop Xcom being cleared too early
+[AIRFLOW-679] Stop concurrent task instances from running
+[AIRFLOW-704][AIRFLOW-1] Fix invalid syntax in BQ hook
+[AIRFLOW-667] Handle BigQuery 503 error
+[AIRFLOW-680] Disable connection pool for commands
+[AIRFLOW-678] Prevent scheduler from double triggering TIs
+[AIRFLOW-677] Kill task if it fails to heartbeat
+[AIRFLOW-674] Ability to add descriptions for DAGs
+[AIRFLOW-682] Bump MAX_PERIODS to make mark_success work for large DAGs
+Use jdk selector to set required jdk
+[AIRFLOW-647] Restore dag.get_active_runs
+[AIRFLOW-662] Change seasons to months in project description
+[AIRFLOW-656] Add dag/task/date index to xcom table
+[AIRFLOW-658] Improve schema_update_options in GCP
+[AIRFLOW-41] Fix pool oversubscription
+[AIRFLOW-489] Add API Framework
+[AIRFLOW-653] Add some missing endpoint tests
+[AIRFLOW-652] Remove obsolete endpoint
+[AIRFLOW-345] Add contrib ECSOperator
+[AIRFLOW-650] Adding Celect to user list
+[AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
+[AIRFLOW-643] Improve date handling for sf_hook
+[AIRFLOW-638] Add schema_update_options to GCP ops
+[AIRFLOW-640] Install and enable nose-ignore-docstring
+[AIRFLOW-639]AIRFLOW-639] Alphasort package names
+[AIRFLOW-375] Fix pylint errors
+[AIRFLOW-347] Show empty DAG runs in tree view
+[AIRFLOW-628] Adding SalesforceHook to contrib/hooks
+[AIRFLOW-514] hive hook loads data from pandas DataFrame into hive and infers types
+[AIRFLOW-565] Fixes DockerOperator on Python3.x
+[AIRFLOW-635] Encryption option for S3 hook
+[AIRFLOW-137] Fix max_active_runs on clearing tasks
+[AIRFLOW-343] Fix schema plumbing in HiveServer2Hook
+[AIRFLOW-130] Fix ssh operator macosx
+[AIRFLOW-633] Show TI attributes in TI view
+[AIRFLOW-626][AIRFLOW-1] HTML Content does not show up when sending email with attachment
+[AIRFLOW-533] Set autocommit via set_autocommit
+[AIRFLOW-629] stop pinning lxml
+[AIRFLOW-464] Add setdefault method to Variable
+[AIRFLOW-626][AIRFLOW-1] HTML Content does not show up when sending email with attachment
+[AIRFLOW-591] Add datadog hook & sensor
+[AIRFLOW-561] Add RedshiftToS3Transfer operator
+[AIRFLOW-570] Pass root to date form on gantt
+[AIRFLOW-504] Store fractional seconds in MySQL tables
+[AIRFLOW-623] LDAP attributes not always a list
+[AIRFLOW-611] source_format in BigQueryBaseCursor
+[AIRFLOW-619] Fix exception in Gannt chart
+[AIRFLOW-618] Cast DateTimes to avoid sqllite errors
+[AIRFLOW-422] Add JSON endpoint for task info
+[AIRFLOW-616][AIRFLOW-617] Minor fixes to PR tool UX
+[AIRFLOW-179] Fix DbApiHook with non-ASCII chars
+[AIRFLOW-566] Add timeout while fetching logs
+[AIRFLOW-615] Set graph glyphicon first
+[AIRFLOW-609] Add application_name to PostgresHook
+[AIRFLOW-604] Revert .first() to .one()
+[AIRFLOW-370] Create AirflowConfigException in exceptions.py
+[AIRFLOW-582] Fixes TI.get_dagrun filter (removes start_date)
+[AIRFLOW-568] Fix double task_stats count if a DagRun is active
+[AIRFLOW-585] Fix race condition in backfill execution loop
+[AIRFLOW-580] Prevent landscape warning on .format
+[AIRFLOW-597] Check if content is None, not false-equivalent
+[AIRFLOW-586] test_dag_v1 fails from 0 to 3 a.m.
+[AIRFLOW-453] Add XCom Admin Page
+[AIRFLOW-588] Add Google Cloud Storage Object sensor[]
+[AIRFLOW-592] example_xcom import Error
+[AIRFLOW-587] Fix incorrect scope for Google Auth[]
+[AIRFLOW-589] Add templatable job_name[]
+[AIRFLOW-227] Show running config in config view
+[AIRFLOW-319]AIRFLOW-319] xcom push response in HTTP Operator
+[AIRFLOW-385] Add symlink to latest scheduler log directory
+[AIRFLOW-583] Fix decode error in gcs_to_bq
+[AIRFLOW-96] s3_conn_id using environment variable
+[AIRFLOW-575] Clarify tutorial and FAQ about `schedule_interval` always inheriting from DAG object
+[AIRFLOW-577] Output BigQuery job for improved debugging
+[AIRFLOW-560] Get URI & SQLA engine from Connection
+[AIRFLOW-518] Require DataProfilingMixin for Variables CRUD
+[AIRFLOW-553] Fix load path for filters.js
+[AIRFLOW-554] Add Jinja support to Spark-sql
+[AIRFLOW-550] Make ssl config check empty string safe
+[AIRFLOW-500] Use id for github allowed teams
+[AIRFLOW-556] Add UI PR guidelines
+[AIRFLOW-358][AIRFLOW-430] Add `connections` cli
+[AIRFLOW-548] Load DAGs immediately & continually
+[AIRFLOW-539] Updated BQ hook and BQ operator to support Standard SQL.
+[AIRFLOW-378] Add string casting to params of spark-sql operator
+[AIRFLOW-544] Add Pause/Resume toggle button
+[AIRFLOW-333][AIRFLOW-258] Fix non-module plugin components
+[AIRFLOW-542] Add tooltip to DAGs links icons
+[AIRFLOW-530] Update docs to reflect connection environment var has to be in uppercase
+[AIRFLOW-525] Update template_fields in Qubole Op
+[AIRFLOW-480] Support binary file download from GCS
+[AIRFLOW-198] Implement latest_only_operator
+[AIRFLOW-91] Add SSL config option for the webserver
+[AIRFLOW-191] Fix connection leak with PostgreSQL backend
+[AIRFLOW-512] Fix 'bellow' typo in docs & comments
+[AIRFLOW-509][AIRFLOW-1] Create operator to delete tables in BigQuery
+[AIRFLOW-498] Remove hard-coded gcp project id
+[AIRFLOW-505] Support unicode characters in authors' names
+[AIRFLOW-494] Add per-operator success/failure metrics
+[AIRFLOW-488] Fix test_simple fail
+[AIRFLOW-468] Update Panda requirement to 0.17.1
+[AIRFLOW-159] Add cloud integration section + GCP documentation
+[AIRFLOW-477][AIRFLOW-478] Restructure security section for clarity
+[AIRFLOW-467] Allow defining of project_id in BigQueryHook
+[AIRFLOW-483] Change print to logging statement
+[AIRFLOW-475] make the segment granularity in Druid hook configurable
+
+
+AIRFLOW 1.7.2
+-------------
+
+[AIRFLOW-463] Link Airflow icon to landing page
+[AIRFLOW-149] Task Dependency Engine + Why Isn't My Task Running View
+[AIRFLOW-361] Add default failure handler for the Qubole Operator
+[AIRFLOW-353] Fix dag run status update failure
+[AIRFLOW-447] Store source URIs in Python 3 compatible list
+[AIRFLOW-443] Make module names unique when importing
+[AIRFLOW-444] Add Google authentication backend
+[AIRFLOW-446][AIRFLOW-445] Adds missing dataproc submit options
+[AIRFLOW-431] Add CLI for CRUD operations on pools
+[AIRFLOW-329] Update Dag Overview Page with Better Status Columns
+[AIRFLOW-360] Fix style warnings in models.py
+[AIRFLOW-425] Add white fill for null state tasks in tree view.
+[AIRFLOW-69] Use dag runs in backfill jobs
+[AIRFLOW-415] Make dag_id not found error clearer
+[AIRFLOW-416] Use ordinals in README's company list
+[AIRFLOW-369] Allow setting default DAG orientation
+[AIRFLOW-410] Add 2 Q/A to the FAQ in the docs
+[AIRFLOW-407] Add different colors for some sensors
+[AIRFLOW-414] Improve error message for missing FERNET_KEY
+[AIRFLOW-406] Sphinx/rst fixes
+[AIRFLOW-412] Fix lxml dependency
+[AIRFLOW-413] Fix unset path bug when backfilling via pickle
+[AIRFLOW-78] Airflow clear leaves dag_runs
+[AIRFLOW-402] Remove NamedHivePartitionSensor static check, add docs
+[AIRFLOW-394] Add an option to the Task Duration graph to show cumulative times
+[AIRFLOW-404] Retry download if unpacking fails for hive
+[AIRFLOW-276] Gunicorn rolling restart
+[AIRFLOW-399] Remove dags/testdruid.py
+[AIRFLOW-400] models.py/DAG.set_dag_runs_state() does not correctly set state
+[AIRFLOW-395] Fix colon/equal signs typo for resources in default config
+[AIRFLOW-397] Documentation: Fix typo "instatiating" to "instantiating"
+[AIRFLOW-395] Remove trailing commas from resources in config
+[AIRFLOW-388] Add a new chart for Task_Tries for each DAG
+[AIRFLOW-322] Fix typo in FAQ section
+[AIRFLOW-375] Pylint fixes
+limit scope to user email only AIRFLOW-386
+[AIRFLOW-383] Cleanup example qubole operator dag
+[AIRFLOW-160] Parse DAG files through child processes
+[AIRFLOW-381] Manual UI Dag Run creation: require dag_id field
+[AIRFLOW-373] Enhance CLI variables functionality
+[AIRFLOW-379] Enhance Variables page functionality: import/export variables
+[AIRFLOW-331] modify the LDAP authentication config lines in  'Security' sample codes
+[AIRFLOW-356][AIRFLOW-355][AIRFLOW-354] Replace nobr, enable DAG only exists locally message, change edit DAG icon
+[AIRFLOW-362] Import __future__ division
+[AIRFLOW-359] Pin flask-login to 0.2.11
+[AIRFLOW-261] Add bcc and cc fields to EmailOperator
+[AIRFLOW-348] Fix code style warnings
+[AIRFLOW-349] Add metric for number of zombies killed
+[AIRFLOW-340] Remove unused dependency on Babel
+[AIRFLOW-339]: Ability to pass a flower conf file
+[AIRFLOW-341][operators] Add resource requirement attributes to operators
+[AIRFLOW-335] Fix simple style errors/warnings
+[AIRFLOW-337] Add __repr__ to VariableAccessor and VariableJsonAccessor
+[AIRFLOW-334] Fix using undefined variable
+[AIRFLOW-315] Fix blank lines code style warnings
+[AIRFLOW-306] Add Spark-sql Hook and Operator
+[AIRFLOW-327] Add rename method to the FTPHook
+[AIRFLOW-321] Fix a wrong code example about tests/dags
+[AIRFLOW-316] Always check DB state for Backfill Job execution
+[AIRFLOW-264] Adding workload management for Hive
+[AIRFLOW-297] support exponential backoff option for retry delay
+[AIRFLOW-31][AIRFLOW-200] Add note to updating.md
+[AIRFLOW-307] There is no __neq__ python magic method.
+[AIRFLOW-309] Add requirements of develop dependencies to docs
+[AIRFLOW-307] Rename __neq__ to __ne__ python magic method.
+[AIRFLOW-313] Fix code style for sqoop_hook.py
+[AIRFLOW-311] Fix wrong path in CONTRIBUTING.md
+[AIRFLOW-24] DataFlow Java Operator
+[AIRFLOW-308] Add link to refresh DAG within DAG view header
+[AIRFLOW-314] Fix BigQuery cursor run_table_upsert method
+[AIRFLOW-298] fix incubator diclaimer in docs
+[AIRFLOW-284] HiveServer2Hook fix for cursor scope for get_results
+[AIRFLOW-260] More graceful exit when issues can't be closed
+[AIRFLOW-260] Handle case when no version is found
+[AIRFLOW-228] Handle empty version list in PR tool
+[AIRFLOW-302] Improve default squash commit message
+[AIRFLOW-187] Improve prompt styling
+[AIRFLOW-187] Fix typo in argument name
+[AIRFLOW-187] Move "Close XXX" message to end of squash commit
+[AIRFLOW-247] Add EMR hook, operators and sensors. Add AWS base hook
+[AIRFLOW-301] Fix broken unit test
+[AIRFLOW-100] Add execution_date_fn to ExternalTaskSensor
+[AIRFLOW-282] Remove PR Tool logic that depends on version formatting
+[AIRFLOW-291] Add index for state in TI table
+[AIRFLOW-269] Add some unit tests for PostgreSQL
+[AIRFLOW-296] template_ext is being treated as a string rather than a tuple in qubole operator
+[AIRFLOW-286] Improve FTPHook to implement context manager interface
+[AIRFLOW-243] Create NamedHivePartitionSensor
+[AIRFLOW-246] Improve dag_stats endpoint query
+[AIRFLOW-189] Highlighting of Parent/Child nodes in Graphs
+[ARFLOW-255] Check dagrun timeout when comparing active runs
+[AIRFLOW-281] Add port to mssql_hook
+[AIRFLOW-285] Use Airflow 2.0 style imports for all remaining hooks/operators
+[AIRFLOW-40] Add LDAP group filtering feature.
+[AIRFLOW-277] Multiple deletions does not work in Task Instances view if using SQLite backend
+[AIRFLOW-200] Make hook/operator imports lazy, and print proper exceptions
+[AIRFLOW-283] Make store_to_xcom_key a templated field in GoogleCloudStorageDownloadOperator
+[AIRFLOW-278] Support utf-8 ecoding for SQL
+[AIRFLOW-280] clean up tmp druid table no matter if an ingestion job succeeds or not
+[AIRFLOW-274] Add XCom functionality to GoogleCloudStorageDownloadOperator
+[AIRFLOW-273] Create an svg version of the airflow logo.
+[AIRFLOW-275] Update contributing guidelines
+[AIRFLOW-244] Modify hive operator to inject analysis data
+[AIRFLOW-162] Allow variable to be accessible into templates
+[AIRFLOW-248] Add Apache license header to all files
+[AIRFLOW-263] Remove temp backtick file
+[AIRFLOW-252] Raise Sqlite exceptions when deleting tasks instance in WebUI
+[AIRFLOW-180] Fix timeout behavior for sensors
+[AIRFLOW-262] Simplify commands in MANIFEST.in
+[AIRFLOW-31] Add zope dependency
+[AIRFLOW-6] Remove dependency on Highcharts
+[AIRFLOW-234] make task that aren't `running` self-terminate
+[AIRFLOW-256] Fix test_scheduler_reschedule heartrate
+Add Python 3 compatibility fix
+[AIRFLOW-31] Use standard imports for hooks/operators
+[AIRFLOW-173] Initial implementation of FileSensor
+[AIRFLOW-224] Collect orphaned tasks and reschedule them
+[AIRFLOW-239] Fix tests indentation
+[AIRFLOW-225] Better units for task duration graph
+[AIRFLOW-241] Add testing done section to PR template
+[AIRFLOW-222] Show duration of task instances in ui
+[AIRFLOW-231] Do not eval user input in PrestoHook
+[AIRFLOW-216] Add Sqoop Hook and Operator
+[AIRFLOW-171] Add upgrade notes on email and S3 to 1.7.1.2
+[AIRFLOW-238] Make compatible with flask-admin 1.4.1
+[AIRFLOW-230] [HiveServer2Hook] adding multi statements support
+[AIRFLOW-142] setup_env.sh doesn't download hive tarball if hdp is specified as distro
+[AIRFLOW-223] Make parametrable the IP on which Flower binds to
+[AIRFLOW-218] Added option to enable webserver gunicorn access/err logs
+[AIRFLOW-213] Add "Closes #X" phrase to commit messages
+[AIRFLOW-68] Align start_date with the schedule_interval
+[AIRFLOW-9] Improving docs to meet Apache's standards
+[AIRFLOW-131] Make XCom.clear more selective
+[AIRFLOW-214] Fix occasion of detached taskinstance
+[AIRFLOW-206] Add commit to close PR
+[AIRFLOW-206] Always load local log files if they exist
+[AIRFLOW-211] Fix JIRA "resolve" vs "close" behavior
+[AIRFLOW-64] Add note about relative DAGS_FOLDER
+[AIRFLOW-114] Sort plugins dropdown
+[AIRFLOW-209] Add scheduler tests and improve lineage handling
+[AIRFLOW-207] Improve JIRA auth workflow
+[AIRFLOW-187] Improve PR tool UX
+[AIRFLOW-155] Documentation of Qubole Operator
+Optimize and refactor process_dag
+[AIRFLOW-185] Handle empty versions list
+[AIRFLOW-201] Fix for HiveMetastoreHook + kerberos
+[AIRFLOW-202]: Fixes stray print line
+[AIRFLOW-196] Fix bug that exception is not handled in HttpSensor
+[AIRFLOW-195] : Add toggle support to subdag clearing in the CLI
+[AIRFLOW-23] Support for Google Cloud DataProc
+[AIRFLOW-25] Configuration for Celery always required
+[AIRFLOW-190] Add codecov and remove download count
+[AIRFLOW-168] Correct evaluation of @once schedule
+[AIRFLOW-183] Fetch log from remote when worker returns 4xx/5xx response
+[AIRFLOW-181] Fix failing unpacking of hadoop by redownloading
+[AIRFLOW-176] remove unused formatting key
+[AIRFLOW-167]: Add dag_state option in cli
+[AIRFLOW-178] Fix bug so that zip file is detected in DAG folder
+[AIRFLOW-176] Improve PR Tool JIRA workflow
+AIRFLOW-45: Support Hidden Airflow Variables
+[AIRFLOW-175] Run git-reset before checkout in PR tool
+[AIRFLOW-157] Make PR tool Py3-compat; add JIRA command
+[AIRFLOW-170] Add missing @apply_defaults
+
+
 AIRFLOW 1.7.1, 2016-05-19
 -------------------------
 


[33/45] incubator-airflow git commit: [AIRFLOW-941] Use defined parameters for psycopg2

Posted by bo...@apache.org.
[AIRFLOW-941] Use defined parameters for psycopg2

This works around
https://github.com/psycopg/psycopg2/issues/517 .

Closes #2126 from bolkedebruin/AIRFLOW-941


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/1f3aead5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/1f3aead5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/1f3aead5

Branch: refs/heads/v1-8-stable
Commit: 1f3aead5c486c3576a5df3b6904aa449b8a1d90a
Parents: 4077c6d
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Mon Mar 6 21:03:14 2017 +0100
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:28:48 2017 -0700

----------------------------------------------------------------------
 airflow/hooks/postgres_hook.py | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1f3aead5/airflow/hooks/postgres_hook.py
----------------------------------------------------------------------
diff --git a/airflow/hooks/postgres_hook.py b/airflow/hooks/postgres_hook.py
index 75c8226..750ebbb 100644
--- a/airflow/hooks/postgres_hook.py
+++ b/airflow/hooks/postgres_hook.py
@@ -32,10 +32,17 @@ class PostgresHook(DbApiHook):
         conn = self.get_connection(self.postgres_conn_id)
         conn_args = dict(
             host=conn.host,
-            user=conn.login,
-            password=conn.password,
-            dbname=conn.schema,
-            port=conn.port)
+            dbname=self.schema or conn.schema)
+        # work around for https://github.com/psycopg/psycopg2/issues/517
+        # todo: remove when psycopg2 2.7.1 is released
+        # https://issues.apache.org/jira/browse/AIRFLOW-945
+        if conn.port:
+            conn_args['port'] = conn.port
+        if conn.login:
+            conn_args['user'] = conn.login
+        if conn.password:
+            conn_args['password'] = conn.password
+
         # check for ssl parameters in conn.extra
         for arg_name, arg_val in conn.extra_dejson.items():
             if arg_name in ['sslmode', 'sslcert', 'sslkey', 'sslrootcert', 'sslcrl', 'application_name']:


[09/45] incubator-airflow git commit: Revert "Revert "[AIRFLOW-782] Add support for DataFlowPythonOperator.""

Posted by bo...@apache.org.
Revert "Revert "[AIRFLOW-782] Add support for DataFlowPythonOperator.""

This reverts commit 7e65998a1bedd00e74fa333cfee78ad574aaa849.


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/eddecd59
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/eddecd59
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/eddecd59

Branch: refs/heads/v1-8-stable
Commit: eddecd59d73191904f2f156e53a138e532dc560a
Parents: 8aacc28
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Sun Feb 12 13:10:33 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Sun Feb 12 13:10:33 2017 +0100

----------------------------------------------------------------------
 airflow/contrib/hooks/gcp_dataflow_hook.py     | 33 +++++---
 airflow/contrib/operators/dataflow_operator.py | 85 +++++++++++++++++++--
 tests/contrib/hooks/gcp_dataflow_hook.py       | 56 ++++++++++++++
 tests/contrib/operators/dataflow_operator.py   | 76 ++++++++++++++++++
 4 files changed, 232 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/eddecd59/airflow/contrib/hooks/gcp_dataflow_hook.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/hooks/gcp_dataflow_hook.py b/airflow/contrib/hooks/gcp_dataflow_hook.py
index bd5bd3c..aaa9992 100644
--- a/airflow/contrib/hooks/gcp_dataflow_hook.py
+++ b/airflow/contrib/hooks/gcp_dataflow_hook.py
@@ -24,6 +24,7 @@ from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
 
 
 class _DataflowJob(object):
+
     def __init__(self, dataflow, project_number, name):
         self._dataflow = dataflow
         self._project_number = project_number
@@ -82,7 +83,8 @@ class _DataflowJob(object):
         return self._job
 
 
-class _DataflowJava(object):
+class _Dataflow(object):
+
     def __init__(self, cmd):
         self._proc = subprocess.Popen(cmd, shell=False, stdout=subprocess.PIPE,
                                       stderr=subprocess.PIPE)
@@ -113,11 +115,12 @@ class _DataflowJava(object):
             else:
                 logging.info("Waiting for DataFlow process to complete.")
         if self._proc.returncode is not 0:
-            raise Exception("DataFlow jar failed with return code {}".format(
+            raise Exception("DataFlow failed with return code {}".format(
                 self._proc.returncode))
 
 
 class DataFlowHook(GoogleCloudBaseHook):
+
     def __init__(self,
                  gcp_conn_id='google_cloud_default',
                  delegate_to=None):
@@ -130,21 +133,27 @@ class DataFlowHook(GoogleCloudBaseHook):
         http_authorized = self._authorize()
         return build('dataflow', 'v1b3', http=http_authorized)
 
+    def _start_dataflow(self, task_id, variables, dataflow, name, command_prefix):
+        cmd = command_prefix + self._build_cmd(task_id, variables, dataflow)
+        _Dataflow(cmd).wait_for_done()
+        _DataflowJob(
+            self.get_conn(), variables['project'], name).wait_for_done()
+
     def start_java_dataflow(self, task_id, variables, dataflow):
         name = task_id + "-" + str(uuid.uuid1())[:8]
-        cmd = self._build_cmd(task_id, variables, dataflow, name)
-        _DataflowJava(cmd).wait_for_done()
-        _DataflowJob(self.get_conn(), variables['project'], name).wait_for_done()
+        variables['jobName'] = name
+        self._start_dataflow(
+            task_id, variables, dataflow, name, ["java", "-jar"])
 
-    def _build_cmd(self, task_id, variables, dataflow, name):
-        command = ["java", "-jar",
-                   dataflow,
-                   "--runner=DataflowPipelineRunner",
-                   "--streaming=false",
-                   "--jobName=" + name]
+    def start_python_dataflow(self, task_id, variables, dataflow, py_options):
+        name = task_id + "-" + str(uuid.uuid1())[:8]
+        variables["job_name"] = name
+        self._start_dataflow(
+            task_id, variables, dataflow, name, ["python"] + py_options)
 
+    def _build_cmd(self, task_id, variables, dataflow):
+        command = [dataflow, "--runner=DataflowPipelineRunner"]
         if variables is not None:
             for attr, value in variables.iteritems():
                 command.append("--" + attr + "=" + value)
-
         return command

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/eddecd59/airflow/contrib/operators/dataflow_operator.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/operators/dataflow_operator.py b/airflow/contrib/operators/dataflow_operator.py
index 10a6811..ef49eb6 100644
--- a/airflow/contrib/operators/dataflow_operator.py
+++ b/airflow/contrib/operators/dataflow_operator.py
@@ -13,6 +13,7 @@
 # limitations under the License.
 
 import copy
+import re
 
 from airflow.contrib.hooks.gcp_dataflow_hook import DataFlowHook
 from airflow.models import BaseOperator
@@ -70,9 +71,13 @@ class DataFlowJavaOperator(BaseOperator):
             *args,
             **kwargs):
         """
-        Create a new DataFlowJavaOperator.
+        Create a new DataFlowJavaOperator. Note that both
+        dataflow_default_options and options will be merged to specify pipeline
+        execution parameter, and dataflow_default_options is expected to save
+        high-level options, for instances, project and zone information, which
+        apply to all dataflow operators in the DAG.
 
-        For more detail on about job submission have a look at the reference:
+        For more detail on job submission have a look at the reference:
 
         https://cloud.google.com/dataflow/pipelines/specifying-exec-params
 
@@ -82,11 +87,12 @@ class DataFlowJavaOperator(BaseOperator):
         :type dataflow_default_options: dict
         :param options: Map of job specific options.
         :type options: dict
-        :param gcp_conn_id: The connection ID to use connecting to Google Cloud Platform.
+        :param gcp_conn_id: The connection ID to use connecting to Google Cloud
+        Platform.
         :type gcp_conn_id: string
         :param delegate_to: The account to impersonate, if any.
-            For this to work, the service account making the request must have domain-wide
-            delegation enabled.
+            For this to work, the service account making the request must have
+            domain-wide delegation enabled.
         :type delegate_to: string
         """
         super(DataFlowJavaOperator, self).__init__(*args, **kwargs)
@@ -101,9 +107,76 @@ class DataFlowJavaOperator(BaseOperator):
         self.options = options
 
     def execute(self, context):
-        hook = DataFlowHook(gcp_conn_id=self.gcp_conn_id, delegate_to=self.delegate_to)
+        hook = DataFlowHook(gcp_conn_id=self.gcp_conn_id,
+                            delegate_to=self.delegate_to)
 
         dataflow_options = copy.copy(self.dataflow_default_options)
         dataflow_options.update(self.options)
 
         hook.start_java_dataflow(self.task_id, dataflow_options, self.jar)
+
+
+class DataFlowPythonOperator(BaseOperator):
+
+    @apply_defaults
+    def __init__(
+            self,
+            py_file,
+            py_options=None,
+            dataflow_default_options=None,
+            options=None,
+            gcp_conn_id='google_cloud_default',
+            delegate_to=None,
+            *args,
+            **kwargs):
+        """
+        Create a new DataFlowPythonOperator. Note that both
+        dataflow_default_options and options will be merged to specify pipeline
+        execution parameter, and dataflow_default_options is expected to save
+        high-level options, for instances, project and zone information, which
+        apply to all dataflow operators in the DAG.
+
+        For more detail on job submission have a look at the reference:
+
+        https://cloud.google.com/dataflow/pipelines/specifying-exec-params
+
+        :param py_file: Reference to the python dataflow pipleline file, e.g.,
+            /some/local/file/path/to/your/python/pipeline/file.py.
+        :type py_file: string
+        :param py_options: Additional python options.
+        :type pyt_options: list of strings, e.g., ["-m", "-v"].
+        :param dataflow_default_options: Map of default job options.
+        :type dataflow_default_options: dict
+        :param options: Map of job specific options.
+        :type options: dict
+        :param gcp_conn_id: The connection ID to use connecting to Google Cloud
+            Platform.
+        :type gcp_conn_id: string
+        :param delegate_to: The account to impersonate, if any.
+            For this to work, the service account making the request must have
+            domain-wide  delegation enabled.
+        :type delegate_to: string
+        """
+        super(DataFlowPythonOperator, self).__init__(*args, **kwargs)
+
+        self.py_file = py_file
+        self.py_options = py_options or []
+        self.dataflow_default_options = dataflow_default_options or {}
+        self.options = options or {}
+        self.gcp_conn_id = gcp_conn_id
+        self.delegate_to = delegate_to
+
+    def execute(self, context):
+        """Execute the python dataflow job."""
+        hook = DataFlowHook(gcp_conn_id=self.gcp_conn_id,
+                            delegate_to=self.delegate_to)
+        dataflow_options = self.dataflow_default_options.copy()
+        dataflow_options.update(self.options)
+        # Convert argument names from lowerCamelCase to snake case.
+        camel_to_snake = lambda name: re.sub(
+            r'[A-Z]', lambda x: '_' + x.group(0).lower(), name)
+        formatted_options = {camel_to_snake(key): dataflow_options[key]
+                             for key in dataflow_options}
+        hook.start_python_dataflow(
+            self.task_id, formatted_options,
+            self.py_file, self.py_options)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/eddecd59/tests/contrib/hooks/gcp_dataflow_hook.py
----------------------------------------------------------------------
diff --git a/tests/contrib/hooks/gcp_dataflow_hook.py b/tests/contrib/hooks/gcp_dataflow_hook.py
new file mode 100644
index 0000000..797d40c
--- /dev/null
+++ b/tests/contrib/hooks/gcp_dataflow_hook.py
@@ -0,0 +1,56 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+from airflow.contrib.hooks.gcp_dataflow_hook import DataFlowHook
+
+try:
+    from unittest import mock
+except ImportError:
+    try:
+        import mock
+    except ImportError:
+        mock = None
+
+
+TASK_ID = 'test-python-dataflow'
+PY_FILE = 'apache_beam.examples.wordcount'
+PY_OPTIONS = ['-m']
+OPTIONS = {
+    'project': 'test',
+    'staging_location': 'gs://test/staging'
+}
+BASE_STRING = 'airflow.contrib.hooks.gcp_api_base_hook.{}'
+DATAFLOW_STRING = 'airflow.contrib.hooks.gcp_dataflow_hook.{}'
+
+
+def mock_init(self, gcp_conn_id, delegate_to=None):
+    pass
+
+
+class DataFlowHookTest(unittest.TestCase):
+
+    def setUp(self):
+        with mock.patch(BASE_STRING.format('GoogleCloudBaseHook.__init__'),
+                        new=mock_init):
+            self.dataflow_hook = DataFlowHook(gcp_conn_id='test')
+
+    @mock.patch(DATAFLOW_STRING.format('DataFlowHook._start_dataflow'))
+    def test_start_python_dataflow(self, internal_dataflow_mock):
+        self.dataflow_hook.start_python_dataflow(
+            task_id=TASK_ID, variables=OPTIONS,
+            dataflow=PY_FILE, py_options=PY_OPTIONS)
+        internal_dataflow_mock.assert_called_once_with(
+            TASK_ID, OPTIONS, PY_FILE, mock.ANY, ['python'] + PY_OPTIONS)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/eddecd59/tests/contrib/operators/dataflow_operator.py
----------------------------------------------------------------------
diff --git a/tests/contrib/operators/dataflow_operator.py b/tests/contrib/operators/dataflow_operator.py
new file mode 100644
index 0000000..4f887c1
--- /dev/null
+++ b/tests/contrib/operators/dataflow_operator.py
@@ -0,0 +1,76 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+
+from airflow.contrib.operators.dataflow_operator import DataFlowPythonOperator
+
+try:
+    from unittest import mock
+except ImportError:
+    try:
+        import mock
+    except ImportError:
+        mock = None
+
+
+TASK_ID = 'test-python-dataflow'
+PY_FILE = 'apache_beam.examples.wordcount'
+PY_OPTIONS = ['-m']
+DEFAULT_OPTIONS = {
+    'project': 'test',
+    'stagingLocation': 'gs://test/staging'
+}
+ADDITIONAL_OPTIONS = {
+    'output': 'gs://test/output'
+}
+
+
+class DataFlowPythonOperatorTest(unittest.TestCase):
+
+    def setUp(self):
+        self.dataflow = DataFlowPythonOperator(
+            task_id=TASK_ID,
+            py_file=PY_FILE,
+            py_options=PY_OPTIONS,
+            dataflow_default_options=DEFAULT_OPTIONS,
+            options=ADDITIONAL_OPTIONS)
+
+    def test_init(self):
+        """Test DataFlowPythonOperator instance is properly initialized."""
+        self.assertEqual(self.dataflow.task_id, TASK_ID)
+        self.assertEqual(self.dataflow.py_file, PY_FILE)
+        self.assertEqual(self.dataflow.py_options, PY_OPTIONS)
+        self.assertEqual(self.dataflow.dataflow_default_options,
+                         DEFAULT_OPTIONS)
+        self.assertEqual(self.dataflow.options,
+                         ADDITIONAL_OPTIONS)
+
+    @mock.patch('airflow.contrib.operators.dataflow_operator.DataFlowHook')
+    def test_exec(self, dataflow_mock):
+        """Test DataFlowHook is created and the right args are passed to
+        start_python_workflow.
+
+        """
+        start_python_hook = dataflow_mock.return_value.start_python_dataflow
+        self.dataflow.execute(None)
+        assert dataflow_mock.called
+        expected_options = {
+            'project': 'test',
+            'staging_location': 'gs://test/staging',
+            'output': 'gs://test/output'
+        }
+        start_python_hook.assert_called_once_with(TASK_ID, expected_options,
+                                                  PY_FILE, PY_OPTIONS)


[13/45] incubator-airflow git commit: [AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has no start date

Posted by bo...@apache.org.
[AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has no start date

Closes #2094 from aoen/ddavydov/fix_webservers_whe
n_bad_startdate_dag

(cherry picked from commit 1c4508d84806debbedac9c4e12f14031c8a1effd)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b38df6b8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b38df6b8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b38df6b8

Branch: refs/heads/v1-8-stable
Commit: b38df6b8c6fc5eefe14b9594827d6f28092f77f8
Parents: 1c23133
Author: Dan Davydov <da...@airbnb.com>
Authored: Thu Feb 23 22:51:02 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Thu Feb 23 22:51:18 2017 +0100

----------------------------------------------------------------------
 airflow/www/templates/airflow/dags.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b38df6b8/airflow/www/templates/airflow/dags.html
----------------------------------------------------------------------
diff --git a/airflow/www/templates/airflow/dags.html b/airflow/www/templates/airflow/dags.html
index 2cbd12e..5792c6a 100644
--- a/airflow/www/templates/airflow/dags.html
+++ b/airflow/www/templates/airflow/dags.html
@@ -108,7 +108,7 @@
                 <td class="text-nowrap">
                     {% if dag %}
                         {% set last_run = dag.get_last_dagrun(include_externally_triggered=True) %}
-                        {% if last_run %}
+                        {% if last_run and last_run.start_date %}
                             <a href="{{ url_for('airflow.graph', dag_id=last_run.dag_id, execution_date=last_run.execution_date ) }}">
                                 {{ last_run.execution_date.strftime("%Y-%m-%d %H:%M") }}
                             </a> <span id="statuses_info" class="glyphicon glyphicon-info-sign" aria-hidden="true" title="Start Date: {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span>


[41/45] incubator-airflow git commit: Make compatible with 1.8

Posted by bo...@apache.org.
Make compatible with 1.8


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8df046bf
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8df046bf
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8df046bf

Branch: refs/heads/v1-8-stable
Commit: 8df046bfbec670a253139c83c6174bb88f25ee7f
Parents: 2b26a5d
Author: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Authored: Sun Mar 12 10:11:15 2017 -0700
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 10:11:15 2017 -0700

----------------------------------------------------------------------
 tests/executors/__init__.py      | 13 ++++++++
 tests/executors/test_executor.py | 56 +++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8df046bf/tests/executors/__init__.py
----------------------------------------------------------------------
diff --git a/tests/executors/__init__.py b/tests/executors/__init__.py
new file mode 100644
index 0000000..a85b772
--- /dev/null
+++ b/tests/executors/__init__.py
@@ -0,0 +1,13 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8df046bf/tests/executors/test_executor.py
----------------------------------------------------------------------
diff --git a/tests/executors/test_executor.py b/tests/executors/test_executor.py
new file mode 100644
index 0000000..9ec6cd4
--- /dev/null
+++ b/tests/executors/test_executor.py
@@ -0,0 +1,56 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from airflow.executors.base_executor import BaseExecutor
+from airflow.utils.state import State
+
+from airflow import settings
+
+
+class TestExecutor(BaseExecutor):
+    """
+    TestExecutor is used for unit testing purposes.
+    """
+    def __init__(self, do_update=False, *args, **kwargs):
+        self.do_update = do_update
+        self._running = []
+        self.history = []
+
+        super(TestExecutor, self).__init__(*args, **kwargs)
+
+    def execute_async(self, key, command, queue=None):
+        self.logger.debug("{} running task instances".format(len(self.running)))
+        self.logger.debug("{} in queue".format(len(self.queued_tasks)))
+
+    def heartbeat(self):
+        session = settings.Session()
+        if self.do_update:
+            self.history.append(list(self.queued_tasks.values()))
+            while len(self._running) > 0:
+                ti = self._running.pop()
+                ti.set_state(State.SUCCESS, session)
+            for key, val in list(self.queued_tasks.items()):
+                (command, priority, queue, ti) = val
+                ti.set_state(State.RUNNING, session)
+                self._running.append(ti)
+                self.queued_tasks.pop(key)
+
+        session.commit()
+        session.close()
+
+    def terminate(self):
+        pass
+
+    def end(self):
+        self.sync()
+


[10/45] incubator-airflow git commit: [AIRFLOW-869] Refactor mark success functionality

Posted by bo...@apache.org.
[AIRFLOW-869] Refactor mark success functionality

This refactors the mark success functionality in a
more generic function that can set multiple states
and properly drills down on SubDags.

Closes #2085 from bolkedebruin/AIRFLOW-869

(cherry picked from commit 28cfd2c541c12468b3e4f634545dfa31a77b0091)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/563cc9a3
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/563cc9a3
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/563cc9a3

Branch: refs/heads/v1-8-stable
Commit: 563cc9a3c8414725a615a93d3910e7a2dbb94999
Parents: eddecd5
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Fri Feb 17 09:05:41 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Fri Feb 17 09:11:41 2017 +0100

----------------------------------------------------------------------
 airflow/api/common/experimental/mark_tasks.py | 187 ++++++++++++++++++
 airflow/jobs.py                               |   4 +-
 airflow/models.py                             |  18 +-
 airflow/www/templates/airflow/dag.html        |   5 -
 airflow/www/views.py                          | 119 +++---------
 tests/api/__init__.py                         |   2 +
 tests/api/common/__init__.py                  |  13 ++
 tests/api/common/mark_tasks.py                | 211 +++++++++++++++++++++
 tests/core.py                                 |  46 +++--
 tests/dags/test_example_bash_operator.py      |  55 ++++++
 tests/models.py                               |   2 +-
 11 files changed, 536 insertions(+), 126 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/airflow/api/common/experimental/mark_tasks.py
----------------------------------------------------------------------
diff --git a/airflow/api/common/experimental/mark_tasks.py b/airflow/api/common/experimental/mark_tasks.py
new file mode 100644
index 0000000..0ddbf98
--- /dev/null
+++ b/airflow/api/common/experimental/mark_tasks.py
@@ -0,0 +1,187 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import datetime
+
+from airflow.jobs import BackfillJob
+from airflow.models import DagRun, TaskInstance
+from airflow.operators.subdag_operator import SubDagOperator
+from airflow.settings import Session
+from airflow.utils.state import State
+
+from sqlalchemy import or_
+
+
+def _create_dagruns(dag, execution_dates, state, run_id_template):
+    """
+    Infers from the dates which dag runs need to be created and does so.
+    :param dag: the dag to create dag runs for
+    :param execution_dates: list of execution dates to evaluate
+    :param state: the state to set the dag run to
+    :param run_id_template:the template for run id to be with the execution date
+    :return: newly created and existing dag runs for the execution dates supplied
+    """
+    # find out if we need to create any dag runs
+    drs = DagRun.find(dag_id=dag.dag_id, execution_date=execution_dates)
+    dates_to_create = list(set(execution_dates) - set([dr.execution_date for dr in drs]))
+
+    for date in dates_to_create:
+        dr = dag.create_dagrun(
+            run_id=run_id_template.format(date.isoformat()),
+            execution_date=date,
+            start_date=datetime.datetime.now(),
+            external_trigger=False,
+            state=state,
+        )
+        drs.append(dr)
+
+    return drs
+
+
+def set_state(task, execution_date, upstream=False, downstream=False,
+              future=False, past=False, state=State.SUCCESS, commit=False):
+    """
+    Set the state of a task instance and if needed its relatives. Can set state
+    for future tasks (calculated from execution_date) and retroactively
+    for past tasks. Will verify integrity of past dag runs in order to create
+    tasks that did not exist. It will not create dag runs that are missing
+    on the schedule (but it will as for subdag dag runs if needed).
+    :param task: the task from which to work. task.task.dag needs to be set
+    :param execution_date: the execution date from which to start looking
+    :param upstream: Mark all parents (upstream tasks)
+    :param downstream: Mark all siblings (downstream tasks) of task_id, including SubDags
+    :param future: Mark all future tasks on the interval of the dag up until
+        last execution date.
+    :param past: Retroactively mark all tasks starting from start_date of the DAG
+    :param state: State to which the tasks need to be set
+    :param commit: Commit tasks to be altered to the database
+    :return: list of tasks that have been created and updated
+    """
+    assert isinstance(execution_date, datetime.datetime)
+
+    # microseconds are supported by the database, but is not handled
+    # correctly by airflow on e.g. the filesystem and in other places
+    execution_date = execution_date.replace(microsecond=0)
+
+    assert task.dag is not None
+    dag = task.dag
+
+    latest_execution_date = dag.latest_execution_date
+    assert latest_execution_date is not None
+
+    # determine date range of dag runs and tasks to consider
+    end_date = latest_execution_date if future else execution_date
+
+    if 'start_date' in dag.default_args:
+        start_date = dag.default_args['start_date']
+    elif dag.start_date:
+        start_date = dag.start_date
+    else:
+        start_date = execution_date
+
+    start_date = execution_date if not past else start_date
+
+    if dag.schedule_interval == '@once':
+        dates = [start_date]
+    else:
+        dates = dag.date_range(start_date=start_date, end_date=end_date)
+
+    # find relatives (siblings = downstream, parents = upstream) if needed
+    task_ids = [task.task_id]
+    if downstream:
+        relatives = task.get_flat_relatives(upstream=False)
+        task_ids += [t.task_id for t in relatives]
+    if upstream:
+        relatives = task.get_flat_relatives(upstream=True)
+        task_ids += [t.task_id for t in relatives]
+
+    # verify the integrity of the dag runs in case a task was added or removed
+    # set the confirmed execution dates as they might be different
+    # from what was provided
+    confirmed_dates = []
+    drs = DagRun.find(dag_id=dag.dag_id, execution_date=dates)
+    for dr in drs:
+        dr.dag = dag
+        dr.verify_integrity()
+        confirmed_dates.append(dr.execution_date)
+
+    # go through subdagoperators and create dag runs. We will only work
+    # within the scope of the subdag. We wont propagate to the parent dag,
+    # but we will propagate from parent to subdag.
+    session = Session()
+    dags = [dag]
+    sub_dag_ids = []
+    while len(dags) > 0:
+        current_dag = dags.pop()
+        for task_id in task_ids:
+            if not current_dag.has_task(task_id):
+                continue
+
+            current_task = current_dag.get_task(task_id)
+            if isinstance(current_task, SubDagOperator):
+                # this works as a kind of integrity check
+                # it creates missing dag runs for subdagoperators,
+                # maybe this should be moved to dagrun.verify_integrity
+                drs = _create_dagruns(current_task.subdag,
+                                      execution_dates=confirmed_dates,
+                                      state=State.RUNNING,
+                                      run_id_template=BackfillJob.ID_FORMAT_PREFIX)
+
+                for dr in drs:
+                    dr.dag = current_task.subdag
+                    dr.verify_integrity()
+                    if commit:
+                        dr.state = state
+                        session.merge(dr)
+
+                dags.append(current_task.subdag)
+                sub_dag_ids.append(current_task.subdag.dag_id)
+
+    # now look for the task instances that are affected
+    TI = TaskInstance
+
+    # get all tasks of the main dag that will be affected by a state change
+    qry_dag = session.query(TI).filter(
+        TI.dag_id==dag.dag_id,
+        TI.execution_date.in_(confirmed_dates),
+        TI.task_id.in_(task_ids)).filter(
+        or_(TI.state.is_(None),
+            TI.state != state)
+    )
+
+    # get *all* tasks of the sub dags
+    if len(sub_dag_ids) > 0:
+        qry_sub_dag = session.query(TI).filter(
+            TI.dag_id.in_(sub_dag_ids),
+            TI.execution_date.in_(confirmed_dates)).filter(
+            or_(TI.state.is_(None),
+                TI.state != state)
+        )
+
+    if commit:
+        tis_altered = qry_dag.with_for_update().all()
+        if len(sub_dag_ids) > 0:
+            tis_altered += qry_sub_dag.with_for_update().all()
+        for ti in tis_altered:
+            ti.state = state
+        session.commit()
+    else:
+        tis_altered = qry_dag.all()
+        if len(sub_dag_ids) > 0:
+            tis_altered += qry_sub_dag.all()
+
+    session.close()
+
+    return tis_altered
+

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/airflow/jobs.py
----------------------------------------------------------------------
diff --git a/airflow/jobs.py b/airflow/jobs.py
index 1362814..3ca0070 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -1632,6 +1632,8 @@ class BackfillJob(BaseJob):
     triggers a set of task instance runs, in the right order and lasts for
     as long as it takes for the set of task instance to be completed.
     """
+    ID_PREFIX = 'backfill_'
+    ID_FORMAT_PREFIX = ID_PREFIX + '{0}'
 
     __mapper_args__ = {
         'polymorphic_identity': 'BackfillJob'
@@ -1716,7 +1718,7 @@ class BackfillJob(BaseJob):
 
         active_dag_runs = []
         while next_run_date and next_run_date <= end_date:
-            run_id = 'backfill_' + next_run_date.isoformat()
+            run_id = BackfillJob.ID_FORMAT_PREFIX.format(next_run_date.isoformat())
 
             # check if we are scheduling on top of a already existing dag_run
             # we could find a "scheduled" run instead of a "backfill"

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/airflow/models.py
----------------------------------------------------------------------
diff --git a/airflow/models.py b/airflow/models.py
index b9af58e..ba8d051 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -2317,6 +2317,7 @@ class BaseOperator(object):
         qry = qry.filter(TI.task_id.in_(tasks))
 
         count = qry.count()
+
         clear_task_instances(qry, session)
 
         session.commit()
@@ -2931,13 +2932,11 @@ class DAG(BaseDag, LoggingMixin):
     @property
     def latest_execution_date(self):
         """
-        Returns the latest date for which at least one task instance exists
+        Returns the latest date for which at least one dag run exists
         """
-        TI = TaskInstance
         session = settings.Session()
-        execution_date = session.query(func.max(TI.execution_date)).filter(
-            TI.dag_id == self.dag_id,
-            TI.task_id.in_(self.task_ids)
+        execution_date = session.query(func.max(DagRun.execution_date)).filter(
+            DagRun.dag_id == self.dag_id
         ).scalar()
         session.commit()
         session.close()
@@ -3330,7 +3329,7 @@ class DAG(BaseDag, LoggingMixin):
 
         # add a placeholder row into DagStat table
         if not session.query(DagStat).filter(DagStat.dag_id == self.dag_id).first():
-            session.add(DagStat(dag_id=self.dag_id, state=State.RUNNING, count=0, dirty=True))
+            session.add(DagStat(dag_id=self.dag_id, state=state, count=0, dirty=True))
         session.commit()
         return run
 
@@ -3801,6 +3800,8 @@ class DagRun(Base):
     def set_state(self, state):
         if self._state != state:
             self._state = state
+            # something really weird goes on here: if you try to close the session
+            # dag runs will end up detached
             session = settings.Session()
             DagStat.set_dirty(self.dag_id, session=session)
 
@@ -3859,7 +3860,10 @@ class DagRun(Base):
         if run_id:
             qry = qry.filter(DR.run_id == run_id)
         if execution_date:
-            qry = qry.filter(DR.execution_date == execution_date)
+            if isinstance(execution_date, list):
+                qry = qry.filter(DR.execution_date.in_(execution_date))
+            else:
+                qry = qry.filter(DR.execution_date == execution_date)
         if state:
             qry = qry.filter(DR.state == state)
         if external_trigger:

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/airflow/www/templates/airflow/dag.html
----------------------------------------------------------------------
diff --git a/airflow/www/templates/airflow/dag.html b/airflow/www/templates/airflow/dag.html
index b9b1afa..8a4793d 100644
--- a/airflow/www/templates/airflow/dag.html
+++ b/airflow/www/templates/airflow/dag.html
@@ -206,10 +206,6 @@
               type="button" class="btn" data-toggle="button">
               Downstream
             </button>
-            <button id="btn_success_recursive"
-              type="button" class="btn" data-toggle="button">
-              Recursive
-            </button>
           </span>
         </div>
         <div class="modal-footer">
@@ -340,7 +336,6 @@ function updateQueryStringParameter(uri, key, value) {
         "&downstream=" + $('#btn_success_downstream').hasClass('active') +
         "&future=" + $('#btn_success_future').hasClass('active') +
         "&past=" + $('#btn_success_past').hasClass('active') +
-        "&recursive=" + $('#btn_success_recursive').hasClass('active') +
         "&execution_date=" + execution_date +
         "&origin=" + encodeURIComponent(window.location);
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/airflow/www/views.py
----------------------------------------------------------------------
diff --git a/airflow/www/views.py b/airflow/www/views.py
index b80d83e..b98bd74 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -23,7 +23,6 @@ from functools import wraps
 from datetime import datetime, timedelta
 import dateutil.parser
 import copy
-from itertools import chain, product
 import json
 
 import inspect
@@ -1085,6 +1084,7 @@ class Airflow(BaseView):
         origin = request.args.get('origin')
         dag = dagbag.get_dag(dag_id)
         task = dag.get_task(task_id)
+        task.dag = dag
 
         execution_date = request.args.get('execution_date')
         execution_date = dateutil.parser.parse(execution_date)
@@ -1093,110 +1093,39 @@ class Airflow(BaseView):
         downstream = request.args.get('downstream') == "true"
         future = request.args.get('future') == "true"
         past = request.args.get('past') == "true"
-        recursive = request.args.get('recursive') == "true"
-        MAX_PERIODS = 5000
 
-        # Flagging tasks as successful
-        session = settings.Session()
-        task_ids = [task_id]
-        dag_ids = [dag_id]
-        task_id_to_dag = {
-            task_id: dag
-        }
-        end_date = ((dag.latest_execution_date or datetime.now())
-                    if future else execution_date)
+        if not dag:
+            flash("Cannot find DAG: {}".format(dag_id))
+            return redirect(origin)
 
-        if 'start_date' in dag.default_args:
-            start_date = dag.default_args['start_date']
-        elif dag.start_date:
-            start_date = dag.start_date
-        else:
-            start_date = execution_date
+        if not task:
+            flash("Cannot find task {} in DAG {}".format(task_id, dag.dag_id))
+            return redirect(origin)
 
-        start_date = execution_date if not past else start_date
+        from airflow.api.common.experimental.mark_tasks import set_state
 
-        if recursive:
-            recurse_tasks(task, task_ids, dag_ids, task_id_to_dag)
-
-        if downstream:
-            relatives = task.get_flat_relatives(upstream=False)
-            task_ids += [t.task_id for t in relatives]
-            if recursive:
-                recurse_tasks(relatives, task_ids, dag_ids, task_id_to_dag)
-        if upstream:
-            relatives = task.get_flat_relatives(upstream=False)
-            task_ids += [t.task_id for t in relatives]
-            if recursive:
-                recurse_tasks(relatives, task_ids, dag_ids, task_id_to_dag)
-        TI = models.TaskInstance
+        if confirmed:
+            altered = set_state(task=task, execution_date=execution_date,
+                                upstream=upstream, downstream=downstream,
+                                future=future, past=past, state=State.SUCCESS,
+                                commit=True)
 
-        if dag.schedule_interval == '@once':
-            dates = [start_date]
-        else:
-            dates = dag.date_range(start_date, end_date=end_date)
-
-        tis = session.query(TI).filter(
-            TI.dag_id.in_(dag_ids),
-            TI.execution_date.in_(dates),
-            TI.task_id.in_(task_ids)).all()
-        tis_to_change = session.query(TI).filter(
-            TI.dag_id.in_(dag_ids),
-            TI.execution_date.in_(dates),
-            TI.task_id.in_(task_ids),
-            TI.state != State.SUCCESS).all()
-        tasks = list(product(task_ids, dates))
-        tis_to_create = list(
-            set(tasks) -
-            set([(ti.task_id, ti.execution_date) for ti in tis]))
-
-        tis_all_altered = list(chain(
-            [(ti.task_id, ti.execution_date) for ti in tis_to_change],
-            tis_to_create))
-
-        if len(tis_all_altered) > MAX_PERIODS:
-            flash("Too many tasks at once (>{0})".format(
-                MAX_PERIODS), 'error')
+            flash("Marked success on {} task instances".format(len(altered)))
             return redirect(origin)
 
-        if confirmed:
-            for ti in tis_to_change:
-                ti.state = State.SUCCESS
-            session.commit()
+        to_be_altered = set_state(task=task, execution_date=execution_date,
+                                  upstream=upstream, downstream=downstream,
+                                  future=future, past=past, state=State.SUCCESS,
+                                  commit=False)
 
-            for task_id, task_execution_date in tis_to_create:
-                ti = TI(
-                    task=task_id_to_dag[task_id].get_task(task_id),
-                    execution_date=task_execution_date,
-                    state=State.SUCCESS)
-                session.add(ti)
-                session.commit()
+        details = "\n".join([str(t) for t in to_be_altered])
 
-            session.commit()
-            session.close()
-            flash("Marked success on {} task instances".format(
-                len(tis_all_altered)))
-
-            return redirect(origin)
-        else:
-            if not tis_all_altered:
-                flash("No task instances to mark as successful", 'error')
-                response = redirect(origin)
-            else:
-                tis = []
-                for task_id, task_execution_date in tis_all_altered:
-                    tis.append(TI(
-                        task=task_id_to_dag[task_id].get_task(task_id),
-                        execution_date=task_execution_date,
-                        state=State.SUCCESS))
-                details = "\n".join([str(t) for t in tis])
+        response = self.render("airflow/confirm.html",
+                               message=("Here's the list of task instances you are "
+                                        "about to mark as successful:"),
+                               details=details)
 
-                response = self.render(
-                    'airflow/confirm.html',
-                    message=(
-                        "Here's the list of task instances you are about "
-                        "to mark as successful:"),
-                    details=details,)
-            return response
+        return response
 
     @expose('/tree')
     @login_required

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/tests/api/__init__.py
----------------------------------------------------------------------
diff --git a/tests/api/__init__.py b/tests/api/__init__.py
index 2db97ad..37d59f0 100644
--- a/tests/api/__init__.py
+++ b/tests/api/__init__.py
@@ -15,3 +15,5 @@
 from __future__ import absolute_import
 
 from .client import *
+from .common import *
+

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/tests/api/common/__init__.py
----------------------------------------------------------------------
diff --git a/tests/api/common/__init__.py b/tests/api/common/__init__.py
new file mode 100644
index 0000000..9d7677a
--- /dev/null
+++ b/tests/api/common/__init__.py
@@ -0,0 +1,13 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/tests/api/common/mark_tasks.py
----------------------------------------------------------------------
diff --git a/tests/api/common/mark_tasks.py b/tests/api/common/mark_tasks.py
new file mode 100644
index 0000000..e01f3ad
--- /dev/null
+++ b/tests/api/common/mark_tasks.py
@@ -0,0 +1,211 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+
+from airflow import models
+from airflow.api.common.experimental.mark_tasks import set_state, _create_dagruns
+from airflow.settings import Session
+from airflow.utils.dates import days_ago
+from airflow.utils.state import State
+
+
+DEV_NULL = "/dev/null"
+
+
+class TestMarkTasks(unittest.TestCase):
+    def setUp(self):
+        self.dagbag = models.DagBag(include_examples=True)
+        self.dag1 = self.dagbag.dags['test_example_bash_operator']
+        self.dag2 = self.dagbag.dags['example_subdag_operator']
+
+        self.execution_dates = [days_ago(2), days_ago(1)]
+
+        drs = _create_dagruns(self.dag1, self.execution_dates,
+                              state=State.RUNNING,
+                              run_id_template="scheduled__{}")
+        for dr in drs:
+            dr.dag = self.dag1
+            dr.verify_integrity()
+
+        drs = _create_dagruns(self.dag2,
+                              [self.dag2.default_args['start_date']],
+                              state=State.RUNNING,
+                              run_id_template="scheduled__{}")
+
+        for dr in drs:
+            dr.dag = self.dag2
+            dr.verify_integrity()
+
+        self.session = Session()
+
+    def snapshot_state(self, dag, execution_dates):
+        TI = models.TaskInstance
+        tis = self.session.query(TI).filter(
+            TI.dag_id==dag.dag_id,
+            TI.execution_date.in_(execution_dates)
+        ).all()
+
+        self.session.expunge_all()
+
+        return tis
+
+    def verify_state(self, dag, task_ids, execution_dates, state, old_tis):
+        TI = models.TaskInstance
+
+        tis = self.session.query(TI).filter(
+            TI.dag_id==dag.dag_id,
+            TI.execution_date.in_(execution_dates)
+        ).all()
+
+        self.assertTrue(len(tis) > 0)
+
+        for ti in tis:
+            if ti.task_id in task_ids and ti.execution_date in execution_dates:
+                self.assertEqual(ti.state, state)
+            else:
+                for old_ti in old_tis:
+                    if (old_ti.task_id == ti.task_id
+                            and old_ti.execution_date == ti.execution_date):
+                            self.assertEqual(ti.state, old_ti.state)
+
+    def test_mark_tasks_now(self):
+        # set one task to success but do not commit
+        snapshot = self.snapshot_state(self.dag1, self.execution_dates)
+        task = self.dag1.get_task("runme_1")
+        altered = set_state(task=task, execution_date=self.execution_dates[0],
+                            upstream=False, downstream=False, future=False,
+                            past=False, state=State.SUCCESS, commit=False)
+        self.assertEqual(len(altered), 1)
+        self.verify_state(self.dag1, [task.task_id], [self.execution_dates[0]],
+                          None, snapshot)
+
+        # set one and only one task to success
+        altered = set_state(task=task, execution_date=self.execution_dates[0],
+                            upstream=False, downstream=False, future=False,
+                            past=False, state=State.SUCCESS, commit=True)
+        self.assertEqual(len(altered), 1)
+        self.verify_state(self.dag1, [task.task_id], [self.execution_dates[0]],
+                          State.SUCCESS, snapshot)
+
+        # set no tasks
+        altered = set_state(task=task, execution_date=self.execution_dates[0],
+                            upstream=False, downstream=False, future=False,
+                            past=False, state=State.SUCCESS, commit=True)
+        self.assertEqual(len(altered), 0)
+        self.verify_state(self.dag1, [task.task_id], [self.execution_dates[0]],
+                          State.SUCCESS, snapshot)
+
+        # set task to other than success
+        altered = set_state(task=task, execution_date=self.execution_dates[0],
+                            upstream=False, downstream=False, future=False,
+                            past=False, state=State.FAILED, commit=True)
+        self.assertEqual(len(altered), 1)
+        self.verify_state(self.dag1, [task.task_id], [self.execution_dates[0]],
+                          State.FAILED, snapshot)
+
+        # dont alter other tasks
+        snapshot = self.snapshot_state(self.dag1, self.execution_dates)
+        task = self.dag1.get_task("runme_0")
+        altered = set_state(task=task, execution_date=self.execution_dates[0],
+                            upstream=False, downstream=False, future=False,
+                            past=False, state=State.SUCCESS, commit=True)
+        self.assertEqual(len(altered), 1)
+        self.verify_state(self.dag1, [task.task_id], [self.execution_dates[0]],
+                          State.SUCCESS, snapshot)
+
+    def test_mark_downstream(self):
+        # test downstream
+        snapshot = self.snapshot_state(self.dag1, self.execution_dates)
+        task = self.dag1.get_task("runme_1")
+        relatives = task.get_flat_relatives(upstream=False)
+        task_ids = [t.task_id for t in relatives]
+        task_ids.append(task.task_id)
+
+        altered = set_state(task=task, execution_date=self.execution_dates[0],
+                            upstream=False, downstream=True, future=False,
+                            past=False, state=State.SUCCESS, commit=True)
+        self.assertEqual(len(altered), 3)
+        self.verify_state(self.dag1, task_ids, [self.execution_dates[0]],
+                          State.SUCCESS, snapshot)
+
+    def test_mark_upstream(self):
+        # test upstream
+        snapshot = self.snapshot_state(self.dag1, self.execution_dates)
+        task = self.dag1.get_task("run_after_loop")
+        relatives = task.get_flat_relatives(upstream=True)
+        task_ids = [t.task_id for t in relatives]
+        task_ids.append(task.task_id)
+
+        altered = set_state(task=task, execution_date=self.execution_dates[0],
+                            upstream=True, downstream=False, future=False,
+                            past=False, state=State.SUCCESS, commit=True)
+        self.assertEqual(len(altered), 4)
+        self.verify_state(self.dag1, task_ids, [self.execution_dates[0]],
+                          State.SUCCESS, snapshot)
+
+    def test_mark_tasks_future(self):
+        # set one task to success towards end of scheduled dag runs
+        snapshot = self.snapshot_state(self.dag1, self.execution_dates)
+        task = self.dag1.get_task("runme_1")
+        altered = set_state(task=task, execution_date=self.execution_dates[0],
+                            upstream=False, downstream=False, future=True,
+                            past=False, state=State.SUCCESS, commit=True)
+        self.assertEqual(len(altered), 2)
+        self.verify_state(self.dag1, [task.task_id], self.execution_dates,
+                          State.SUCCESS, snapshot)
+
+    def test_mark_tasks_past(self):
+        # set one task to success towards end of scheduled dag runs
+        snapshot = self.snapshot_state(self.dag1, self.execution_dates)
+        task = self.dag1.get_task("runme_1")
+        altered = set_state(task=task, execution_date=self.execution_dates[1],
+                            upstream=False, downstream=False, future=False,
+                            past=True, state=State.SUCCESS, commit=True)
+        self.assertEqual(len(altered), 2)
+        self.verify_state(self.dag1, [task.task_id], self.execution_dates,
+                          State.SUCCESS, snapshot)
+
+    def test_mark_tasks_subdag(self):
+        # set one task to success towards end of scheduled dag runs
+        task = self.dag2.get_task("section-1")
+        relatives = task.get_flat_relatives(upstream=False)
+        task_ids = [t.task_id for t in relatives]
+        task_ids.append(task.task_id)
+
+        altered = set_state(task=task, execution_date=self.execution_dates[0],
+                            upstream=False, downstream=True, future=False,
+                            past=False, state=State.SUCCESS, commit=True)
+        self.assertEqual(len(altered), 14)
+
+        # cannot use snapshot here as that will require drilling down the
+        # the sub dag tree essentially recreating the same code as in the
+        # tested logic.
+        self.verify_state(self.dag2, task_ids, [self.execution_dates[0]],
+                          State.SUCCESS, [])
+
+    def tearDown(self):
+        self.dag1.clear()
+        self.dag2.clear()
+
+        # just to make sure we are fully cleaned up
+        self.session.query(models.DagRun).delete()
+        self.session.query(models.TaskInstance).delete()
+        self.session.commit()
+
+        self.session.close()
+
+if __name__ == '__main__':
+    unittest.main()

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/tests/core.py
----------------------------------------------------------------------
diff --git a/tests/core.py b/tests/core.py
index 0f7e41d..e35809d 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -1374,6 +1374,7 @@ class CliTests(unittest.TestCase):
         os.remove('variables1.json')
         os.remove('variables2.json')
 
+
 class WebUiTests(unittest.TestCase):
     def setUp(self):
         configuration.load_test_config()
@@ -1383,11 +1384,26 @@ class WebUiTests(unittest.TestCase):
         app.config['TESTING'] = True
         self.app = app.test_client()
 
-        self.dagbag = models.DagBag(
-            dag_folder=DEV_NULL, include_examples=True)
+        self.dagbag = models.DagBag(include_examples=True)
         self.dag_bash = self.dagbag.dags['example_bash_operator']
+        self.dag_bash2 = self.dagbag.dags['test_example_bash_operator']
+        self.sub_dag = self.dagbag.dags['example_subdag_operator']
         self.runme_0 = self.dag_bash.get_task('runme_0')
 
+        self.dag_bash2.create_dagrun(
+            run_id="test_{}".format(models.DagRun.id_for_date(datetime.now())),
+            execution_date=DEFAULT_DATE,
+            start_date=datetime.now(),
+            state=State.RUNNING
+        )
+
+        self.sub_dag.create_dagrun(
+            run_id="test_{}".format(models.DagRun.id_for_date(datetime.now())),
+            execution_date=DEFAULT_DATE,
+            start_date=datetime.now(),
+            state=State.RUNNING
+        )
+
     def test_index(self):
         response = self.app.get('/', follow_redirects=True)
         assert "DAGs" in response.data.decode('utf-8')
@@ -1470,7 +1486,7 @@ class WebUiTests(unittest.TestCase):
         assert "example_bash_operator" in response.data.decode('utf-8')
         url = (
             "/admin/airflow/success?task_id=run_this_last&"
-            "dag_id=example_bash_operator&upstream=false&downstream=false&"
+            "dag_id=test_example_bash_operator&upstream=false&downstream=false&"
             "future=false&past=false&execution_date={}&"
             "origin=/admin".format(DEFAULT_DATE_DS))
         response = self.app.get(url)
@@ -1478,7 +1494,7 @@ class WebUiTests(unittest.TestCase):
         response = self.app.get(url + "&confirmed=true")
         response = self.app.get(
             '/admin/airflow/clear?task_id=run_this_last&'
-            'dag_id=example_bash_operator&future=true&past=false&'
+            'dag_id=test_example_bash_operator&future=true&past=false&'
             'upstream=true&downstream=false&'
             'execution_date={}&'
             'origin=/admin'.format(DEFAULT_DATE_DS))
@@ -1486,7 +1502,7 @@ class WebUiTests(unittest.TestCase):
         url = (
             "/admin/airflow/success?task_id=section-1&"
             "dag_id=example_subdag_operator&upstream=true&downstream=true&"
-            "recursive=true&future=false&past=false&execution_date={}&"
+            "future=false&past=false&execution_date={}&"
             "origin=/admin".format(DEFAULT_DATE_DS))
         response = self.app.get(url)
         assert "Wait a minute" in response.data.decode('utf-8')
@@ -1498,7 +1514,7 @@ class WebUiTests(unittest.TestCase):
         response = self.app.get(url + "&confirmed=true")
         url = (
             "/admin/airflow/clear?task_id=runme_1&"
-            "dag_id=example_bash_operator&future=false&past=false&"
+            "dag_id=test_example_bash_operator&future=false&past=false&"
             "upstream=false&downstream=true&"
             "execution_date={}&"
             "origin=/admin".format(DEFAULT_DATE_DS))
@@ -1542,23 +1558,19 @@ class WebUiTests(unittest.TestCase):
     def test_fetch_task_instance(self):
         url = (
             "/admin/airflow/object/task_instances?"
-            "dag_id=example_bash_operator&"
+            "dag_id=test_example_bash_operator&"
             "execution_date={}".format(DEFAULT_DATE_DS))
         response = self.app.get(url)
-        assert "{}" in response.data.decode('utf-8')
-
-        TI = models.TaskInstance
-        ti = TI(
-            task=self.runme_0, execution_date=DEFAULT_DATE)
-        job = jobs.LocalTaskJob(task_instance=ti, ignore_ti_state=True)
-        job.run()
-
-        response = self.app.get(url)
-        assert "runme_0" in response.data.decode('utf-8')
+        self.assertIn("run_this_last", response.data.decode('utf-8'))
 
     def tearDown(self):
         configuration.conf.set("webserver", "expose_config", "False")
         self.dag_bash.clear(start_date=DEFAULT_DATE, end_date=datetime.now())
+        session = Session()
+        session.query(models.DagRun).delete()
+        session.query(models.TaskInstance).delete()
+        session.commit()
+        session.close()
 
 
 class WebPasswordAuthTest(unittest.TestCase):

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/tests/dags/test_example_bash_operator.py
----------------------------------------------------------------------
diff --git a/tests/dags/test_example_bash_operator.py b/tests/dags/test_example_bash_operator.py
new file mode 100644
index 0000000..ad03353
--- /dev/null
+++ b/tests/dags/test_example_bash_operator.py
@@ -0,0 +1,55 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import airflow
+from builtins import range
+from airflow.operators.bash_operator import BashOperator
+from airflow.operators.dummy_operator import DummyOperator
+from airflow.models import DAG
+from datetime import timedelta
+
+
+args = {
+    'owner': 'airflow',
+    'start_date': airflow.utils.dates.days_ago(2)
+}
+
+dag = DAG(
+    dag_id='test_example_bash_operator', default_args=args,
+    schedule_interval='0 0 * * *',
+    dagrun_timeout=timedelta(minutes=60))
+
+cmd = 'ls -l'
+run_this_last = DummyOperator(task_id='run_this_last', dag=dag)
+
+run_this = BashOperator(
+    task_id='run_after_loop', bash_command='echo 1', dag=dag)
+run_this.set_downstream(run_this_last)
+
+for i in range(3):
+    i = str(i)
+    task = BashOperator(
+        task_id='runme_'+i,
+        bash_command='echo "{{ task_instance_key_str }}" && sleep 1',
+        dag=dag)
+    task.set_downstream(run_this)
+
+task = BashOperator(
+    task_id='also_run_this',
+    bash_command='echo "run_id={{ run_id }} | dag_run={{ dag_run }}"',
+    dag=dag)
+task.set_downstream(run_this_last)
+
+if __name__ == "__main__":
+    dag.cli()

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/563cc9a3/tests/models.py
----------------------------------------------------------------------
diff --git a/tests/models.py b/tests/models.py
index 003fb21..868ea36 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -188,7 +188,7 @@ class DagBagTest(unittest.TestCase):
         class TestDagBag(models.DagBag):
             process_file_calls = 0
             def process_file(self, filepath, only_if_updated=True, safe_mode=True):
-                if 'example_bash_operator.py' in filepath:
+                if 'example_bash_operator.py' == os.path.basename(filepath):
                     TestDagBag.process_file_calls += 1
                 super(TestDagBag, self).process_file(filepath, only_if_updated, safe_mode)
 



[02/45] incubator-airflow git commit: Bump version to 1.8.1alpha0

Posted by bo...@apache.org.
Bump version to 1.8.1alpha0


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/ce3f88b6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/ce3f88b6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/ce3f88b6

Branch: refs/heads/v1-8-stable
Commit: ce3f88b68b926b2bdd2c8d1d0b21113c1d7f246e
Parents: 8dc27c6
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Thu Feb 2 20:24:42 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Thu Feb 2 20:24:42 2017 +0100

----------------------------------------------------------------------
 airflow/version.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ce3f88b6/airflow/version.py
----------------------------------------------------------------------
diff --git a/airflow/version.py b/airflow/version.py
index c31d3f8..8f87df9 100644
--- a/airflow/version.py
+++ b/airflow/version.py
@@ -13,4 +13,4 @@
 # limitations under the License.
 #
 
-version = '1.8.0rc1+apache.incubating'
+version = '1.8.1alpha0'


[40/45] incubator-airflow git commit: [AIRFLOW-900] Double trigger should not kill original task instance

Posted by bo...@apache.org.
[AIRFLOW-900] Double trigger should not kill original task instance

This update the tests of an earlier AIRFLOW-900.

Closes #2146 from bolkedebruin/AIRFLOW-900


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/2b26a5d9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/2b26a5d9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/2b26a5d9

Branch: refs/heads/v1-8-stable
Commit: 2b26a5d95ce230b66255c8e7e7388c8013dc6ba6
Parents: 57faa53
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Sat Mar 11 13:42:58 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:36:07 2017 -0700

----------------------------------------------------------------------
 tests/core.py                     | 58 -----------------------
 tests/dags/sleep_forever_dag.py   | 29 ------------
 tests/dags/test_double_trigger.py | 29 ++++++++++++
 tests/jobs.py                     | 86 ++++++++++++++++++++++++++++++++--
 4 files changed, 112 insertions(+), 90 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2b26a5d9/tests/core.py
----------------------------------------------------------------------
diff --git a/tests/core.py b/tests/core.py
index 636ad43..870a0cb 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -896,64 +896,6 @@ class CoreTest(unittest.TestCase):
                 trigger_rule="non_existant",
                 dag=self.dag)
 
-    def test_run_task_twice(self):
-        """If two copies of a TI run, the new one should die, and old should live"""
-        dagbag = models.DagBag(
-            dag_folder=TEST_DAG_FOLDER,
-            include_examples=False,
-        )
-        TI = models.TaskInstance
-        dag = dagbag.dags.get('sleep_forever_dag')
-        task = dag.task_dict.get('sleeps_forever')
-    
-        ti = TI(task=task, execution_date=DEFAULT_DATE)
-        job1 = jobs.LocalTaskJob(
-            task_instance=ti, ignore_ti_state=True, executor=SequentialExecutor())
-        job2 = jobs.LocalTaskJob(
-            task_instance=ti, ignore_ti_state=True, executor=SequentialExecutor())
-
-        p1 = multiprocessing.Process(target=job1.run)
-        p2 = multiprocessing.Process(target=job2.run)
-        try:
-            p1.start()
-            start_time = timetime()
-            sleep(5.0) # must wait for session to be created on p1
-            settings.engine.dispose()
-            session = settings.Session()
-            ti.refresh_from_db(session=session)
-            self.assertEqual(State.RUNNING, ti.state)
-            p1pid = ti.pid
-            settings.engine.dispose()
-            p2.start()
-            p2.join(5) # wait 5 seconds until termination
-            self.assertFalse(p2.is_alive())
-            self.assertTrue(p1.is_alive())
-
-            settings.engine.dispose()
-            session = settings.Session()
-            ti.refresh_from_db(session=session)
-            self.assertEqual(State.RUNNING, ti.state)
-            self.assertEqual(p1pid, ti.pid)
-
-            # check changing hostname kills task
-            ti.refresh_from_db(session=session, lock_for_update=True)
-            ti.hostname = 'nonexistenthostname'
-            session.merge(ti)
-            session.commit()
-
-            p1.join(5)
-            self.assertFalse(p1.is_alive())
-        finally:
-            try:
-                p1.terminate()
-            except AttributeError:
-                pass # process already terminated
-            try:
-                p2.terminate()
-            except AttributeError:
-                pass # process already terminated
-            session.close()
-
     def test_terminate_task(self):
         """If a task instance's db state get deleted, it should fail"""
         TI = models.TaskInstance

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2b26a5d9/tests/dags/sleep_forever_dag.py
----------------------------------------------------------------------
diff --git a/tests/dags/sleep_forever_dag.py b/tests/dags/sleep_forever_dag.py
deleted file mode 100644
index b1f810e..0000000
--- a/tests/dags/sleep_forever_dag.py
+++ /dev/null
@@ -1,29 +0,0 @@
-# -*- coding: utf-8 -*-
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Used for unit tests"""
-import airflow
-from airflow.operators.bash_operator import BashOperator
-from airflow.models import DAG
-
-dag = DAG(
-    dag_id='sleep_forever_dag',
-    schedule_interval=None,
-)
-
-task = BashOperator(
-    task_id='sleeps_forever',
-    dag=dag,
-    bash_command="sleep 10000000000",
-    start_date=airflow.utils.dates.days_ago(2),
-    owner='airflow')

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2b26a5d9/tests/dags/test_double_trigger.py
----------------------------------------------------------------------
diff --git a/tests/dags/test_double_trigger.py b/tests/dags/test_double_trigger.py
new file mode 100644
index 0000000..b58f5c9
--- /dev/null
+++ b/tests/dags/test_double_trigger.py
@@ -0,0 +1,29 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from datetime import datetime
+
+from airflow.models import DAG
+from airflow.operators.dummy_operator import DummyOperator
+
+DEFAULT_DATE = datetime(2016, 1, 1)
+
+args = {
+    'owner': 'airflow',
+    'start_date': DEFAULT_DATE,
+}
+
+dag = DAG(dag_id='test_localtaskjob_double_trigger', default_args=args)
+task = DummyOperator(
+    task_id='test_localtaskjob_double_trigger_task',
+    dag=dag)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2b26a5d9/tests/jobs.py
----------------------------------------------------------------------
diff --git a/tests/jobs.py b/tests/jobs.py
index d208fd4..aee0e9c 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -23,12 +23,13 @@ import os
 import shutil
 import unittest
 import six
-import sys
+import socket
 from tempfile import mkdtemp
 
 from airflow import AirflowException, settings, models
 from airflow.bin import cli
-from airflow.jobs import BackfillJob, SchedulerJob
+from airflow.executors import SequentialExecutor
+from airflow.jobs import BackfillJob, SchedulerJob, LocalTaskJob
 from airflow.models import DAG, DagModel, DagBag, DagRun, Pool, TaskInstance as TI
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.operators.bash_operator import BashOperator
@@ -36,8 +37,12 @@ from airflow.utils.db import provide_session
 from airflow.utils.state import State
 from airflow.utils.timeout import timeout
 from airflow.utils.dag_processing import SimpleDagBag
+
 from mock import patch
-from tests.executor.test_executor import TestExecutor
+from sqlalchemy.orm.session import make_transient
+from tests.executors.test_executor import TestExecutor
+
+from tests.core import TEST_DAG_FOLDER
 
 from airflow import configuration
 configuration.load_test_config()
@@ -344,6 +349,81 @@ class BackfillJobTest(unittest.TestCase):
                 self.assertEqual(State.NONE, ti.state)
 
 
+class LocalTaskJobTest(unittest.TestCase):
+    def setUp(self):
+        pass
+
+    @patch.object(LocalTaskJob, "_is_descendant_process")
+    def test_localtaskjob_heartbeat(self, is_descendant):
+        session = settings.Session()
+        dag = DAG(
+            'test_localtaskjob_heartbeat',
+            start_date=DEFAULT_DATE,
+            default_args={'owner': 'owner1'})
+
+        with dag:
+            op1 = DummyOperator(task_id='op1')
+
+        dag.clear()
+        dr = dag.create_dagrun(run_id="test",
+                               state=State.SUCCESS,
+                               execution_date=DEFAULT_DATE,
+                               start_date=DEFAULT_DATE,
+                               session=session)
+        ti = dr.get_task_instance(task_id=op1.task_id, session=session)
+        ti.state = State.RUNNING
+        ti.hostname = "blablabla"
+        session.commit()
+
+        job1 = LocalTaskJob(task_instance=ti, ignore_ti_state=True, executor=SequentialExecutor())
+        self.assertRaises(AirflowException, job1.heartbeat_callback)
+
+        is_descendant.return_value = True
+        ti.state = State.RUNNING
+        ti.hostname = socket.getfqdn()
+        ti.pid = 1
+        session.merge(ti)
+        session.commit()
+
+        ret = job1.heartbeat_callback()
+        self.assertEqual(ret, None)
+
+        is_descendant.return_value = False
+        self.assertRaises(AirflowException, job1.heartbeat_callback)
+
+    def test_localtaskjob_double_trigger(self):
+        dagbag = models.DagBag(
+            dag_folder=TEST_DAG_FOLDER,
+            include_examples=False,
+        )
+        dag = dagbag.dags.get('test_localtaskjob_double_trigger')
+        task = dag.get_task('test_localtaskjob_double_trigger_task')
+
+        session = settings.Session()
+
+        dag.clear()
+        dr = dag.create_dagrun(run_id="test",
+                               state=State.SUCCESS,
+                               execution_date=DEFAULT_DATE,
+                               start_date=DEFAULT_DATE,
+                               session=session)
+        ti = dr.get_task_instance(task_id=task.task_id, session=session)
+        ti.state = State.RUNNING
+        ti.hostname = socket.getfqdn()
+        ti.pid = 1
+        session.commit()
+
+        ti_run = TI(task=task, execution_date=DEFAULT_DATE)
+        job1 = LocalTaskJob(task_instance=ti_run, ignore_ti_state=True, executor=SequentialExecutor())
+        self.assertRaises(AirflowException, job1.run)
+
+        ti = dr.get_task_instance(task_id=task.task_id, session=session)
+        self.assertEqual(ti.pid, 1)
+        self.assertEqual(ti.state, State.RUNNING)
+
+        session.close()
+
+
 class SchedulerJobTest(unittest.TestCase):
     # These defaults make the test faster to run
     default_scheduler_args = {"file_process_interval": 0,


[32/45] incubator-airflow git commit: [AIRFLOW-719] Prevent DAGs from ending prematurely

Posted by bo...@apache.org.
[AIRFLOW-719] Prevent DAGs from ending prematurely

DAGs using ALL_SUCCESS and ONE_SUCCESS trigger
rules were ending
prematurely when upstream tasks were skipped.
Changes mean that the
ALL_SUCCESS and ONE_SUCCESS triggers rule
encompasses both SUCCESS and
SKIPPED tasks.

Closes #2125 from dhuang/AIRFLOW-719


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/4077c6de
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/4077c6de
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/4077c6de

Branch: refs/heads/v1-8-stable
Commit: 4077c6de297566a4c598065867a9a27324ae6eb1
Parents: 157054e
Author: Daniel Huang <dx...@gmail.com>
Authored: Sat Mar 4 17:33:23 2017 +0100
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:27:30 2017 -0700

----------------------------------------------------------------------
 airflow/ti_deps/deps/trigger_rule_dep.py      |  6 +-
 tests/dags/test_dagrun_short_circuit_false.py | 38 +++++++++++
 tests/models.py                               | 79 +++++++++++++++++++---
 3 files changed, 111 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4077c6de/airflow/ti_deps/deps/trigger_rule_dep.py
----------------------------------------------------------------------
diff --git a/airflow/ti_deps/deps/trigger_rule_dep.py b/airflow/ti_deps/deps/trigger_rule_dep.py
index 281ed51..da13bba 100644
--- a/airflow/ti_deps/deps/trigger_rule_dep.py
+++ b/airflow/ti_deps/deps/trigger_rule_dep.py
@@ -135,7 +135,7 @@ class TriggerRuleDep(BaseTIDep):
             if tr == TR.ALL_SUCCESS:
                 if upstream_failed or failed:
                     ti.set_state(State.UPSTREAM_FAILED, session)
-                elif skipped:
+                elif skipped == upstream:
                     ti.set_state(State.SKIPPED, session)
             elif tr == TR.ALL_FAILED:
                 if successes or skipped:
@@ -148,7 +148,7 @@ class TriggerRuleDep(BaseTIDep):
                     ti.set_state(State.SKIPPED, session)
 
         if tr == TR.ONE_SUCCESS:
-            if successes <= 0:
+            if successes <= 0 and skipped <= 0:
                 yield self._failing_status(
                     reason="Task's trigger rule '{0}' requires one upstream "
                     "task success, but none were found. "
@@ -162,7 +162,7 @@ class TriggerRuleDep(BaseTIDep):
                     "upstream_tasks_state={1}, upstream_task_ids={2}"
                     .format(tr, upstream_tasks_state, task.upstream_task_ids))
         elif tr == TR.ALL_SUCCESS:
-            num_failures = upstream - successes
+            num_failures = upstream - (successes + skipped)
             if num_failures > 0:
                 yield self._failing_status(
                     reason="Task's trigger rule '{0}' requires all upstream "

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4077c6de/tests/dags/test_dagrun_short_circuit_false.py
----------------------------------------------------------------------
diff --git a/tests/dags/test_dagrun_short_circuit_false.py b/tests/dags/test_dagrun_short_circuit_false.py
new file mode 100644
index 0000000..805ab67
--- /dev/null
+++ b/tests/dags/test_dagrun_short_circuit_false.py
@@ -0,0 +1,38 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from datetime import datetime
+
+from airflow.models import DAG
+from airflow.operators.python_operator import ShortCircuitOperator
+from airflow.operators.dummy_operator import DummyOperator
+
+
+# DAG that has its short circuit op fail and skip multiple downstream tasks
+dag = DAG(
+    dag_id='test_dagrun_short_circuit_false',
+    start_date=datetime(2017, 1, 1)
+)
+dag_task1 = ShortCircuitOperator(
+    task_id='test_short_circuit_false',
+    dag=dag,
+    python_callable=lambda: False)
+dag_task2 = DummyOperator(
+    task_id='test_state_skipped1',
+    dag=dag)
+dag_task3 = DummyOperator(
+    task_id='test_state_skipped2',
+    dag=dag)
+dag_task1.set_downstream(dag_task2)
+dag_task2.set_downstream(dag_task3)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4077c6de/tests/models.py
----------------------------------------------------------------------
diff --git a/tests/models.py b/tests/models.py
index 7ca01e7..d904ff3 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -34,6 +34,7 @@ from airflow.ti_deps.deps.trigger_rule_dep import TriggerRuleDep
 from airflow.utils.state import State
 from mock import patch
 from nose_parameterized import parameterized
+from tests.core import TEST_DAG_FOLDER
 
 DEFAULT_DATE = datetime.datetime(2016, 1, 1)
 TEST_DAGS_FOLDER = os.path.join(
@@ -117,13 +118,71 @@ class DagTest(unittest.TestCase):
         self.assertEqual(dag.dag_id, 'creating_dag_in_cm')
         self.assertEqual(dag.tasks[0].task_id, 'op6')
 
+
 class DagRunTest(unittest.TestCase):
+
+    def setUp(self):
+        self.dagbag = models.DagBag(dag_folder=TEST_DAG_FOLDER)
+
+    def create_dag_run(self, dag_id, state=State.RUNNING, task_states=None):
+        now = datetime.datetime.now()
+        dag = self.dagbag.get_dag(dag_id)
+        dag_run = dag.create_dagrun(
+            run_id='manual__' + now.isoformat(),
+            execution_date=now,
+            start_date=now,
+            state=State.RUNNING,
+            external_trigger=False,
+        )
+
+        if task_states is not None:
+            session = settings.Session()
+            for task_id, state in task_states.items():
+                ti = dag_run.get_task_instance(task_id)
+                ti.set_state(state, session)
+            session.close()
+
+        return dag_run
+
     def test_id_for_date(self):
         run_id = models.DagRun.id_for_date(
             datetime.datetime(2015, 1, 2, 3, 4, 5, 6, None))
-        assert run_id == 'scheduled__2015-01-02T03:04:05', (
+        self.assertEqual(
+            'scheduled__2015-01-02T03:04:05', run_id,
             'Generated run_id did not match expectations: {0}'.format(run_id))
 
+    def test_dagrun_running_when_upstream_skipped(self):
+        """
+        Tests that a DAG run is not failed when an upstream task is skipped
+        """
+        initial_task_states = {
+            'test_short_circuit_false': State.SUCCESS,
+            'test_state_skipped1': State.SKIPPED,
+            'test_state_skipped2': State.NONE,
+        }
+        # dags/test_dagrun_short_circuit_false.py
+        dag_run = self.create_dag_run('test_dagrun_short_circuit_false',
+                                      state=State.RUNNING,
+                                      task_states=initial_task_states)
+        updated_dag_state = dag_run.update_state()
+        self.assertEqual(State.RUNNING, updated_dag_state)
+
+    def test_dagrun_success_when_all_skipped(self):
+        """
+        Tests that a DAG run succeeds when all tasks are skipped
+        """
+        initial_task_states = {
+            'test_short_circuit_false': State.SUCCESS,
+            'test_state_skipped1': State.SKIPPED,
+            'test_state_skipped2': State.SKIPPED,
+        }
+        # dags/test_dagrun_short_circuit_false.py
+        dag_run = self.create_dag_run('test_dagrun_short_circuit_false',
+                                      state=State.RUNNING,
+                                      task_states=initial_task_states)
+        updated_dag_state = dag_run.update_state()
+        self.assertEqual(State.SUCCESS, updated_dag_state)
+
 
 class DagBagTest(unittest.TestCase):
 
@@ -501,7 +560,7 @@ class TaskInstanceTest(unittest.TestCase):
         self.assertEqual(dt, ti.end_date+max_delay)
 
     def test_depends_on_past(self):
-        dagbag = models.DagBag()
+        dagbag = models.DagBag(dag_folder=TEST_DAG_FOLDER)
         dag = dagbag.get_dag('test_depends_on_past')
         dag.clear()
         task = dag.tasks[0]
@@ -530,10 +589,11 @@ class TaskInstanceTest(unittest.TestCase):
         #
         # Tests for all_success
         #
-        ['all_success', 5, 0, 0, 0, 0, True, None, True],
-        ['all_success', 2, 0, 0, 0, 0, True, None, False],
-        ['all_success', 2, 0, 1, 0, 0, True, ST.UPSTREAM_FAILED, False],
-        ['all_success', 2, 1, 0, 0, 0, True, ST.SKIPPED, False],
+        ['all_success', 5, 0, 0, 0, 5, True, None, True],
+        ['all_success', 2, 0, 0, 0, 2, True, None, False],
+        ['all_success', 2, 0, 1, 0, 3, True, ST.UPSTREAM_FAILED, False],
+        ['all_success', 2, 1, 0, 0, 3, True, None, False],
+        ['all_success', 0, 5, 0, 0, 5, True, ST.SKIPPED, True],
         #
         # Tests for one_success
         #
@@ -541,6 +601,7 @@ class TaskInstanceTest(unittest.TestCase):
         ['one_success', 2, 0, 0, 0, 2, True, None, True],
         ['one_success', 2, 0, 1, 0, 3, True, None, True],
         ['one_success', 2, 1, 0, 0, 3, True, None, True],
+        ['one_success', 0, 2, 0, 0, 2, True, None, True],
         #
         # Tests for all_failed
         #
@@ -552,9 +613,9 @@ class TaskInstanceTest(unittest.TestCase):
         #
         # Tests for one_failed
         #
-        ['one_failed', 5, 0, 0, 0, 0, True, None, False],
-        ['one_failed', 2, 0, 0, 0, 0, True, None, False],
-        ['one_failed', 2, 0, 1, 0, 0, True, None, True],
+        ['one_failed', 5, 0, 0, 0, 5, True, ST.SKIPPED, False],
+        ['one_failed', 2, 0, 0, 0, 2, True, None, False],
+        ['one_failed', 2, 0, 1, 0, 2, True, None, True],
         ['one_failed', 2, 1, 0, 0, 3, True, None, False],
         ['one_failed', 2, 3, 0, 0, 5, True, ST.SKIPPED, False],
         #


[39/45] incubator-airflow git commit: Fix tests for topological sort

Posted by bo...@apache.org.
Fix tests for topological sort


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/57faa530
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/57faa530
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/57faa530

Branch: refs/heads/v1-8-stable
Commit: 57faa530f7e9580cda9bb0200d40af15d323df24
Parents: 1243ab1
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Sat Mar 11 13:26:39 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:34:53 2017 -0700

----------------------------------------------------------------------
 tests/models.py | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/57faa530/tests/models.py
----------------------------------------------------------------------
diff --git a/tests/models.py b/tests/models.py
index 55117d4..ffd1f31 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -171,10 +171,20 @@ class DagTest(unittest.TestCase):
         topological_list = dag.topological_sort()
         logging.info(topological_list)
 
-        self.assertTrue(topological_list[0] == op5 or topological_list[0] == op4)
-        self.assertTrue(topological_list[1] == op4 or topological_list[1] == op5)
-        self.assertTrue(topological_list[2] == op1 or topological_list[2] == op2)
-        self.assertTrue(topological_list[3] == op1 or topological_list[3] == op2)
+        set1 = [op4, op5]
+        self.assertTrue(topological_list[0] in set1)
+        set1.remove(topological_list[0])
+
+        set2 = [op1, op2]
+        set2.extend(set1)
+        self.assertTrue(topological_list[1] in set2)
+        set2.remove(topological_list[1])
+
+        self.assertTrue(topological_list[2] in set2)
+        set2.remove(topological_list[2])
+
+        self.assertTrue(topological_list[3] in set2)
+
         self.assertTrue(topological_list[4] == op3)
 
         dag = DAG(


[35/45] incubator-airflow git commit: [AIRFLOW-910] Use parallel task execution for backfills

Posted by bo...@apache.org.
[AIRFLOW-910] Use parallel task execution for backfills

The refactor to use dag runs in backfills caused a
regression
in task execution performance as dag runs were
executed
sequentially. Next to that, the backfills were non
deterministic
due to the random execution of tasks, causing root
tasks
being added to the non ready list too soon.

This updates the backfill logic as follows:
* Parallelize execution of tasks
* Use a leave first execution model
* Replace state updates from the executor by task
based only

Closes #2107 from bolkedebruin/AIRFLOW-910


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/dcc8ede5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/dcc8ede5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/dcc8ede5

Branch: refs/heads/v1-8-stable
Commit: dcc8ede5c1a2f6819b151dd5ce839f0a0917313a
Parents: 8ffaadf
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Sat Mar 11 09:40:38 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:33:52 2017 -0700

----------------------------------------------------------------------
 airflow/jobs.py                    | 385 +++++++++++++++++---------------
 airflow/models.py                  |  50 +++++
 tests/executor/test_executor.py    |  25 ++-
 tests/jobs.py                      |  48 ++++
 tests/models.py                    |  66 ++++++
 tests/operators/subdag_operator.py |   4 +-
 6 files changed, 393 insertions(+), 185 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/dcc8ede5/airflow/jobs.py
----------------------------------------------------------------------
diff --git a/airflow/jobs.py b/airflow/jobs.py
index fedad55..b6913f3 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -211,6 +211,28 @@ class BaseJob(Base, LoggingMixin):
     def _execute(self):
         raise NotImplementedError("This method needs to be overridden")
 
+    @provide_session
+    def reset_state_for_orphaned_tasks(self, dag_run, session=None):
+        """
+        This function checks for a DagRun if there are any tasks
+        that have a scheduled state but are not known by the
+        executor. If it finds those it will reset the state to None
+        so they will get picked up again.
+        """
+        queued_tis = self.executor.queued_tasks
+
+        # also consider running as the state might not have changed in the db yet
+        running = self.executor.running
+        tis = list()
+        tis.extend(dag_run.get_task_instances(state=State.SCHEDULED, session=session))
+        tis.extend(dag_run.get_task_instances(state=State.QUEUED, session=session))
+
+        for ti in tis:
+            if ti.key not in queued_tis and ti.key not in running:
+                self.logger.debug("Rescheduling orphaned task {}".format(ti))
+                ti.state = State.NONE
+        session.commit()
+
 
 class DagFileProcessor(AbstractDagFileProcessor):
     """Helps call SchedulerJob.process_file() in a separate process."""
@@ -1236,28 +1258,6 @@ class SchedulerJob(BaseJob):
 
         self.logger.info(log_str)
 
-    @provide_session
-    def _reset_state_for_orphaned_tasks(self, dag_run, session=None):
-        """
-        This function checks for a DagRun if there are any tasks
-        that have a scheduled state but are not known by the
-        executor. If it finds those it will reset the state to None
-        so they will get picked up again.
-        """
-        queued_tis = self.executor.queued_tasks
-
-        # also consider running as the state might not have changed in the db yet
-        running = self.executor.running
-        tis = list()
-        tis.extend(dag_run.get_task_instances(state=State.SCHEDULED, session=session))
-        tis.extend(dag_run.get_task_instances(state=State.QUEUED, session=session))
-
-        for ti in tis:
-            if ti.key not in queued_tis and ti.key not in running:
-                self.logger.debug("Rescheduling orphaned task {}".format(ti))
-                ti.state = State.NONE
-        session.commit()
-
     def _execute(self):
         self.logger.info("Starting the scheduler")
         pessimistic_connection_handling()
@@ -1361,7 +1361,7 @@ class SchedulerJob(BaseJob):
         for dr in active_runs:
             self.logger.info("Resetting {} {}".format(dr.dag_id,
                                                       dr.execution_date))
-            self._reset_state_for_orphaned_tasks(dr, session=session)
+            self.reset_state_for_orphaned_tasks(dr, session=session)
 
         session.close()
 
@@ -1663,6 +1663,68 @@ class BackfillJob(BaseJob):
         self.pool = pool
         super(BackfillJob, self).__init__(*args, **kwargs)
 
+    def _update_counters(self, started, succeeded, skipped, failed, tasks_to_run):
+        """
+        Updates the counters per state of the tasks that were running
+        :param started:
+        :param succeeded:
+        :param skipped:
+        :param failed:
+        :param tasks_to_run:
+        """
+        for key, ti in list(started.items()):
+            ti.refresh_from_db()
+            if ti.state == State.SUCCESS:
+                succeeded.add(key)
+                self.logger.debug("Task instance {} succeeded. "
+                                  "Don't rerun.".format(ti))
+                started.pop(key)
+                continue
+            elif ti.state == State.SKIPPED:
+                skipped.add(key)
+                self.logger.debug("Task instance {} skipped. "
+                                  "Don't rerun.".format(ti))
+                started.pop(key)
+                continue
+            elif ti.state == State.FAILED:
+                self.logger.error("Task instance {} failed".format(ti))
+                failed.add(key)
+                started.pop(key)
+                continue
+            # special case: if the task needs to run again put it back
+            elif ti.state == State.UP_FOR_RETRY:
+                self.logger.warning("Task instance {} is up for retry"
+                                    .format(ti))
+                started.pop(key)
+                tasks_to_run[key] = ti
+
+    def _manage_executor_state(self, started):
+        """
+        Checks if the executor agrees with the state of task instances
+        that are running
+        :param started: dict of key, task to verify
+        """
+        executor = self.executor
+
+        for key, state in list(executor.get_event_buffer().items()):
+            if key not in started:
+                self.logger.warning("{} state {} not in started={}"
+                                    .format(key, state, started.values()))
+                continue
+
+            ti = started[key]
+            ti.refresh_from_db()
+
+            self.logger.debug("Executor state: {} task {}".format(state, ti))
+
+            if state == State.FAILED or state == State.SUCCESS:
+                if ti.state == State.RUNNING or ti.state == State.QUEUED:
+                    msg = ("Executor reports task instance {} finished ({}) "
+                           "although the task says its {}. Was the task "
+                           "killed externally?".format(ti, state, ti.state))
+                    self.logger.error(msg)
+                    ti.handle_failure(msg)
+
     def _execute(self):
         """
         Runs a dag for a specified date range.
@@ -1700,13 +1762,12 @@ class BackfillJob(BaseJob):
 
         executor = self.executor
         executor.start()
-        executor_fails = Counter()
 
         # Build a list of all instances to run
         tasks_to_run = {}
         failed = set()
         succeeded = set()
-        started = set()
+        started = {}
         skipped = set()
         not_ready = set()
         deadlocked = set()
@@ -1744,33 +1805,40 @@ class BackfillJob(BaseJob):
             run.state = State.RUNNING
             run.verify_integrity(session=session)
 
+            # check if we have orphaned tasks
+            self.reset_state_for_orphaned_tasks(dag_run=run, session=session)
+
             # for some reason if we dont refresh the reference to run is lost
             run.refresh_from_db()
             make_transient(run)
             active_dag_runs.append(run)
 
+            for ti in run.get_task_instances():
+                # all tasks part of the backfill are scheduled to run
+                ti.set_state(State.SCHEDULED, session=session)
+                tasks_to_run[ti.key] = ti
+
             next_run_date = self.dag.following_schedule(next_run_date)
 
-        run_count = 0
-        for run in active_dag_runs:
-            logging.info("Checking run {}".format(run))
-            run_count = run_count + 1
-
-            def get_task_instances_for_dag_run(dag_run):
-                # this needs a fresh session sometimes tis get detached
-                # can be more finegrained (excluding success or skipped)
-                tasks = {}
-                for ti in dag_run.get_task_instances():
-                    tasks[ti.key] = ti
-                return tasks
-
-            # Triggering what is ready to get triggered
-            while not deadlocked:
-                tasks_to_run = get_task_instances_for_dag_run(run)
-                self.logger.debug("Clearing out not_ready list")
-                not_ready.clear()
+        finished_runs = 0
+        total_runs = len(active_dag_runs)
+
+        # Triggering what is ready to get triggered
+        while (len(tasks_to_run) > 0 or len(started) > 0) and not deadlocked:
+            self.logger.debug("*** Clearing out not_ready list ***")
+            not_ready.clear()
 
+            # we need to execute the tasks bottom to top
+            # or leaf to root, as otherwise tasks might be
+            # determined deadlocked while they are actually
+            # waiting for their upstream to finish
+            for task in self.dag.topological_sort():
                 for key, ti in list(tasks_to_run.items()):
+                    if task.task_id != ti.task_id:
+                        continue
+
+                    ti.refresh_from_db()
+
                     task = self.dag.get_task(ti.task_id)
                     ti.task = task
 
@@ -1779,6 +1847,7 @@ class BackfillJob(BaseJob):
                         ti.execution_date == (start_date or ti.start_date))
                     self.logger.debug("Task instance to run {} state {}"
                                       .format(ti, ti.state))
+
                     # The task was already marked successful or skipped by a
                     # different Job. Don't rerun it.
                     if ti.state == State.SUCCESS:
@@ -1786,178 +1855,130 @@ class BackfillJob(BaseJob):
                         self.logger.debug("Task instance {} succeeded. "
                                           "Don't rerun.".format(ti))
                         tasks_to_run.pop(key)
+                        if key in started:
+                            started.pop(key)
                         continue
                     elif ti.state == State.SKIPPED:
                         skipped.add(key)
                         self.logger.debug("Task instance {} skipped. "
                                           "Don't rerun.".format(ti))
                         tasks_to_run.pop(key)
+                        if key in started:
+                            started.pop(key)
                         continue
                     elif ti.state == State.FAILED:
                         self.logger.error("Task instance {} failed".format(ti))
                         failed.add(key)
                         tasks_to_run.pop(key)
+                        if key in started:
+                            started.pop(key)
+                        continue
+                    elif ti.state == State.UPSTREAM_FAILED:
+                        self.logger.error("Task instance {} upstream failed".format(ti))
+                        failed.add(key)
+                        tasks_to_run.pop(key)
+                        if key in started:
+                            started.pop(key)
                         continue
-
                     backfill_context = DepContext(
                         deps=RUN_DEPS,
                         ignore_depends_on_past=ignore_depends_on_past,
                         ignore_task_deps=self.ignore_task_deps,
                         flag_upstream_failed=True)
+
                     # Is the task runnable? -- then run it
+                    # the dependency checker can change states of tis
                     if ti.are_dependencies_met(
                             dep_context=backfill_context,
                             session=session,
                             verbose=True):
-                        self.logger.debug('Sending {} to executor'.format(ti))
-                        if ti.state == State.NONE:
-                            ti.state = State.SCHEDULED
+                        ti.refresh_from_db(lock_for_update=True, session=session)
+                        if ti.state == State.SCHEDULED or ti.state == State.UP_FOR_RETRY:
+                            # Skip scheduled state, we are executing immediately
+                            ti.state = State.QUEUED
                             session.merge(ti)
+                            self.logger.debug('Sending {} to executor'.format(ti))
+                            executor.queue_task_instance(
+                                ti,
+                                mark_success=self.mark_success,
+                                pickle_id=pickle_id,
+                                ignore_task_deps=self.ignore_task_deps,
+                                ignore_depends_on_past=ignore_depends_on_past,
+                                pool=self.pool)
+                            started[key] = ti
+                            tasks_to_run.pop(key)
                         session.commit()
-                        executor.queue_task_instance(
-                            ti,
-                            mark_success=self.mark_success,
-                            pickle_id=pickle_id,
-                            ignore_task_deps=self.ignore_task_deps,
-                            ignore_depends_on_past=ignore_depends_on_past,
-                            pool=self.pool)
-                        started.add(key)
-
-                    # Mark the task as not ready to run
-                    elif ti.state in (State.NONE, State.UPSTREAM_FAILED):
-                        self.logger.debug('Adding {} to not_ready'.format(ti))
-                        not_ready.add(key)
-
-                    session.commit()
-
-                self.heartbeat()
-                executor.heartbeat()
-
-                # If the set of tasks that aren't ready ever equals the set of
-                # tasks to run, then the backfill is deadlocked
-                if not_ready and not_ready == set(tasks_to_run):
-                    self.logger.warn("Deadlock discovered for tasks_to_run={}"
-                                     .format(tasks_to_run.values()))
-                    deadlocked.update(tasks_to_run.values())
-                    tasks_to_run.clear()
-
-                # Reacting to events
-                for key, state in list(executor.get_event_buffer().items()):
-                    if key not in tasks_to_run:
-                        self.logger.warn("{} state {} not in tasks_to_run={}"
-                                         .format(key, state,
-                                                 tasks_to_run.values()))
                         continue
-                    ti = tasks_to_run[key]
-                    ti.refresh_from_db()
-                    logging.info("Executor state: {} task {}".format(state, ti))
-                    # executor reports failure
-                    if state == State.FAILED:
-
-                        # task reports running
-                        if ti.state == State.RUNNING:
-                            msg = (
-                                'Executor reports that task instance {} failed '
-                                'although the task says it is running.'.format(ti))
-                            self.logger.error(msg)
-                            ti.handle_failure(msg)
-                            tasks_to_run.pop(key)
 
-                        # task reports skipped
-                        elif ti.state == State.SKIPPED:
-                            self.logger.error("Skipping {} ".format(ti))
-                            skipped.add(key)
-                            tasks_to_run.pop(key)
+                    if ti.state == State.UPSTREAM_FAILED:
+                        self.logger.error("Task instance {} upstream failed".format(ti))
+                        failed.add(key)
+                        tasks_to_run.pop(key)
+                        if key in started:
+                            started.pop(key)
+                        continue
 
-                        # anything else is a failure
-                        else:
-                            self.logger.error("Task instance {} failed".format(ti))
-                            failed.add(key)
-                            tasks_to_run.pop(key)
+                    # all remaining tasks
+                    self.logger.debug('Adding {} to not_ready'.format(ti))
+                    not_ready.add(key)
 
-                    # executor reports success
-                    elif state == State.SUCCESS:
+            # execute the tasks in the queue
+            self.heartbeat()
+            executor.heartbeat()
 
-                        # task reports success
-                        if ti.state == State.SUCCESS:
-                            self.logger.info(
-                                'Task instance {} succeeded'.format(ti))
-                            succeeded.add(key)
-                            tasks_to_run.pop(key)
+            # If the set of tasks that aren't ready ever equals the set of
+            # tasks to run and there are no running tasks then the backfill
+            # is deadlocked
+            if not_ready and not_ready == set(tasks_to_run) and len(started) == 0:
+                self.logger.warning("Deadlock discovered for tasks_to_run={}"
+                                    .format(tasks_to_run.values()))
+                deadlocked.update(tasks_to_run.values())
+                tasks_to_run.clear()
 
-                        # task reports failure
-                        elif ti.state == State.FAILED:
-                            self.logger.error("Task instance {} failed".format(ti))
-                            failed.add(key)
-                            tasks_to_run.pop(key)
+            # check executor state
+            self._manage_executor_state(started)
 
-                        # task reports skipped
-                        elif ti.state == State.SKIPPED:
-                            self.logger.info("Task instance {} skipped".format(ti))
-                            skipped.add(key)
-                            tasks_to_run.pop(key)
-
-                        # this probably won't ever be triggered
-                        elif ti in not_ready:
-                            self.logger.info(
-                                "{} wasn't expected to run, but it did".format(ti))
-
-                        # executor reports success but task does not - this is weird
-                        elif ti.state not in (
-                                State.SCHEDULED,
-                                State.QUEUED,
-                                State.UP_FOR_RETRY):
-                            self.logger.error(
-                                "The airflow run command failed "
-                                "at reporting an error. This should not occur "
-                                "in normal circumstances. Task state is '{}',"
-                                "reported state is '{}'. TI is {}"
-                                "".format(ti.state, state, ti))
-
-                            # if the executor fails 3 or more times, stop trying to
-                            # run the task
-                            executor_fails[key] += 1
-                            if executor_fails[key] >= 3:
-                                msg = (
-                                    'The airflow run command failed to report an '
-                                    'error for task {} three or more times. The '
-                                    'task is being marked as failed. This is very '
-                                    'unusual and probably means that an error is '
-                                    'taking place before the task even '
-                                    'starts.'.format(key))
-                                self.logger.error(msg)
-                                ti.handle_failure(msg)
-                                tasks_to_run.pop(key)
-                msg = ' | '.join([
-                    "[backfill progress]",
-                    "dag run {6} of {7}",
-                    "tasks waiting: {0}",
-                    "succeeded: {1}",
-                    "kicked_off: {2}",
-                    "failed: {3}",
-                    "skipped: {4}",
-                    "deadlocked: {5}"
-                ]).format(
-                    len(tasks_to_run),
-                    len(succeeded),
-                    len(started),
-                    len(failed),
-                    len(skipped),
-                    len(deadlocked),
-                    run_count,
-                    len(active_dag_runs))
-                self.logger.info(msg)
-
-                self.logger.debug("Finished dag run loop iteration. "
-                                  "Remaining tasks {}"
-                                  .format(tasks_to_run.values()))
-                if len(tasks_to_run) == 0:
-                    break
+            # update the task counters
+            self._update_counters(started=started, succeeded=succeeded,
+                                  skipped=skipped, failed=failed,
+                                  tasks_to_run=tasks_to_run)
 
             # update dag run state
-            run.update_state(session=session)
-            if run.dag.is_paused:
-                models.DagStat.clean_dirty([run.dag_id], session=session)
+            _dag_runs = active_dag_runs[:]
+            for run in _dag_runs:
+                run.update_state(session=session)
+                if run.state in State.finished():
+                    finished_runs += 1
+                    active_dag_runs.remove(run)
+
+                if run.dag.is_paused:
+                    models.DagStat.clean_dirty([run.dag_id], session=session)
+
+            msg = ' | '.join([
+                "[backfill progress]",
+                "finished run {0} of {1}",
+                "tasks waiting: {2}",
+                "succeeded: {3}",
+                "kicked_off: {4}",
+                "failed: {5}",
+                "skipped: {6}",
+                "deadlocked: {7}",
+                "not ready: {8}"
+            ]).format(
+                finished_runs,
+                total_runs,
+                len(tasks_to_run),
+                len(succeeded),
+                len(started),
+                len(failed),
+                len(skipped),
+                len(deadlocked),
+                len(not_ready))
+            self.logger.info(msg)
+
+            self.logger.debug("Finished dag run loop iteration. "
+                              "Remaining tasks {}"
+                              .format(tasks_to_run.values()))
 
         executor.end()
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/dcc8ede5/airflow/models.py
----------------------------------------------------------------------
diff --git a/airflow/models.py b/airflow/models.py
index 3fef407..e63da3e 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -3018,6 +3018,56 @@ class DAG(BaseDag, LoggingMixin):
     def roots(self):
         return [t for t in self.tasks if not t.downstream_list]
 
+    def topological_sort(self):
+        """
+        Sorts tasks in topographical order, such that a task comes after any of its
+        upstream dependencies.
+
+        Heavily inspired by:
+        http://blog.jupo.org/2012/04/06/topological-sorting-acyclic-directed-graphs/
+        :returns: list of tasks in topological order
+        """
+
+        # copy the the tasks so we leave it unmodified
+        graph_unsorted = self.tasks[:]
+
+        graph_sorted = []
+
+        # special case
+        if len(self.tasks) == 0:
+            return tuple(graph_sorted)
+
+        # Run until the unsorted graph is empty.
+        while graph_unsorted:
+            # Go through each of the node/edges pairs in the unsorted
+            # graph. If a set of edges doesn't contain any nodes that
+            # haven't been resolved, that is, that are still in the
+            # unsorted graph, remove the pair from the unsorted graph,
+            # and append it to the sorted graph. Note here that by using
+            # using the items() method for iterating, a copy of the
+            # unsorted graph is used, allowing us to modify the unsorted
+            # graph as we move through it. We also keep a flag for
+            # checking that that graph is acyclic, which is true if any
+            # nodes are resolved during each pass through the graph. If
+            # not, we need to bail out as the graph therefore can't be
+            # sorted.
+            acyclic = False
+            for node in list(graph_unsorted):
+                for edge in node.upstream_list:
+                    if edge in graph_unsorted:
+                        break
+                # no edges in upstream tasks
+                else:
+                    acyclic = True
+                    graph_unsorted.remove(node)
+                    graph_sorted.append(node)
+
+            if not acyclic:
+                raise AirflowException("A cyclic dependency occurred in dag: {}"
+                                       .format(self.dag_id))
+
+        return tuple(graph_sorted)
+
     @provide_session
     def set_dag_runs_state(
             self, state=State.RUNNING, session=None):

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/dcc8ede5/tests/executor/test_executor.py
----------------------------------------------------------------------
diff --git a/tests/executor/test_executor.py b/tests/executor/test_executor.py
index 2015d9c..9ec6cd4 100644
--- a/tests/executor/test_executor.py
+++ b/tests/executor/test_executor.py
@@ -12,18 +12,41 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from airflow.executors.base_executor import BaseExecutor
+from airflow.utils.state import State
+
+from airflow import settings
 
 
 class TestExecutor(BaseExecutor):
     """
     TestExecutor is used for unit testing purposes.
     """
+    def __init__(self, do_update=False, *args, **kwargs):
+        self.do_update = do_update
+        self._running = []
+        self.history = []
+
+        super(TestExecutor, self).__init__(*args, **kwargs)
+
     def execute_async(self, key, command, queue=None):
         self.logger.debug("{} running task instances".format(len(self.running)))
         self.logger.debug("{} in queue".format(len(self.queued_tasks)))
 
     def heartbeat(self):
-        pass
+        session = settings.Session()
+        if self.do_update:
+            self.history.append(list(self.queued_tasks.values()))
+            while len(self._running) > 0:
+                ti = self._running.pop()
+                ti.set_state(State.SUCCESS, session)
+            for key, val in list(self.queued_tasks.items()):
+                (command, priority, queue, ti) = val
+                ti.set_state(State.RUNNING, session)
+                self._running.append(ti)
+                self.queued_tasks.pop(key)
+
+        session.commit()
+        session.close()
 
     def terminate(self):
         pass

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/dcc8ede5/tests/jobs.py
----------------------------------------------------------------------
diff --git a/tests/jobs.py b/tests/jobs.py
index 1f7950e..1acf269 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -162,6 +162,54 @@ class BackfillJobTest(unittest.TestCase):
                 ignore_first_depends_on_past=True)
             job.run()
 
+    def test_backfill_ordered_concurrent_execute(self):
+        dag = DAG(
+            dag_id='test_backfill_ordered_concurrent_execute',
+            start_date=DEFAULT_DATE,
+            schedule_interval="@daily")
+
+        with dag:
+            op1 = DummyOperator(task_id='leave1')
+            op2 = DummyOperator(task_id='leave2')
+            op3 = DummyOperator(task_id='upstream_level_1')
+            op4 = DummyOperator(task_id='upstream_level_2')
+            op5 = DummyOperator(task_id='upstream_level_3')
+            # order randomly
+            op2.set_downstream(op3)
+            op1.set_downstream(op3)
+            op4.set_downstream(op5)
+            op3.set_downstream(op4)
+
+        dag.clear()
+
+        executor = TestExecutor(do_update=True)
+        job = BackfillJob(dag=dag,
+                          executor=executor,
+                          start_date=DEFAULT_DATE,
+                          end_date=DEFAULT_DATE + datetime.timedelta(days=2),
+                          )
+        job.run()
+
+        # test executor history keeps a list
+        history = executor.history
+
+        # check if right order. Every loop has a 'pause' (0) to change state
+        # from RUNNING to SUCCESS.
+        # 6,0,3,0,3,0,3,0 = 8 loops
+        self.assertEqual(8, len(history))
+
+        loop_count = 0
+
+        while len(history) > 0:
+            queued_tasks = history.pop(0)
+            if loop_count == 0:
+                # first loop should contain 6 tasks (3 days x 2 tasks)
+                self.assertEqual(6, len(queued_tasks))
+            if loop_count == 2 or loop_count == 4 or loop_count == 6:
+                # 3 days x 1 task
+                self.assertEqual(3, len(queued_tasks))
+            loop_count += 1
+
     def test_backfill_pooled_tasks(self):
         """
         Test that queued tasks are executed by BackfillJob

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/dcc8ede5/tests/models.py
----------------------------------------------------------------------
diff --git a/tests/models.py b/tests/models.py
index d904ff3..55117d4 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -18,6 +18,7 @@ from __future__ import print_function
 from __future__ import unicode_literals
 
 import datetime
+import logging
 import os
 import unittest
 import time
@@ -118,6 +119,71 @@ class DagTest(unittest.TestCase):
         self.assertEqual(dag.dag_id, 'creating_dag_in_cm')
         self.assertEqual(dag.tasks[0].task_id, 'op6')
 
+    def test_dag_topological_sort(self):
+        dag = DAG(
+            'dag',
+            start_date=DEFAULT_DATE,
+            default_args={'owner': 'owner1'})
+
+        # A -> B
+        # A -> C -> D
+        # ordered: B, D, C, A or D, B, C, A or D, C, B, A
+        with dag:
+            op1 = DummyOperator(task_id='A')
+            op2 = DummyOperator(task_id='B')
+            op3 = DummyOperator(task_id='C')
+            op4 = DummyOperator(task_id='D')
+            op1.set_upstream([op2, op3])
+            op3.set_upstream(op4)
+
+        topological_list = dag.topological_sort()
+        logging.info(topological_list)
+
+        tasks = [op2, op3, op4]
+        self.assertTrue(topological_list[0] in tasks)
+        tasks.remove(topological_list[0])
+        self.assertTrue(topological_list[1] in tasks)
+        tasks.remove(topological_list[1])
+        self.assertTrue(topological_list[2] in tasks)
+        tasks.remove(topological_list[2])
+        self.assertTrue(topological_list[3] == op1)
+
+        dag = DAG(
+            'dag',
+            start_date=DEFAULT_DATE,
+            default_args={'owner': 'owner1'})
+
+        # C -> (A u B) -> D
+        # C -> E
+        # ordered: E | D, A | B, C
+        with dag:
+            op1 = DummyOperator(task_id='A')
+            op2 = DummyOperator(task_id='B')
+            op3 = DummyOperator(task_id='C')
+            op4 = DummyOperator(task_id='D')
+            op5 = DummyOperator(task_id='E')
+            op1.set_downstream(op3)
+            op2.set_downstream(op3)
+            op1.set_upstream(op4)
+            op2.set_upstream(op4)
+            op5.set_downstream(op3)
+
+        topological_list = dag.topological_sort()
+        logging.info(topological_list)
+
+        self.assertTrue(topological_list[0] == op5 or topological_list[0] == op4)
+        self.assertTrue(topological_list[1] == op4 or topological_list[1] == op5)
+        self.assertTrue(topological_list[2] == op1 or topological_list[2] == op2)
+        self.assertTrue(topological_list[3] == op1 or topological_list[3] == op2)
+        self.assertTrue(topological_list[4] == op3)
+
+        dag = DAG(
+            'dag',
+            start_date=DEFAULT_DATE,
+            default_args={'owner': 'owner1'})
+
+        self.assertEquals(tuple(), dag.topological_sort())
+
 
 class DagRunTest(unittest.TestCase):
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/dcc8ede5/tests/operators/subdag_operator.py
----------------------------------------------------------------------
diff --git a/tests/operators/subdag_operator.py b/tests/operators/subdag_operator.py
index 6a25ac3..6f6847c 100644
--- a/tests/operators/subdag_operator.py
+++ b/tests/operators/subdag_operator.py
@@ -91,8 +91,8 @@ class SubDagOperatorTests(unittest.TestCase):
         subdag = dagbag.get_dag('test_subdag_deadlock.subdag')
         subdag.clear()
 
-        # first make sure subdag is deadlocked
-        self.assertRaisesRegexp(AirflowException, 'deadlocked', subdag.run, start_date=DEFAULT_DATE, end_date=DEFAULT_DATE)
+        # first make sure subdag has failed
+        self.assertRaises(AirflowException, subdag.run, start_date=DEFAULT_DATE, end_date=DEFAULT_DATE)
 
         # now make sure dag picks up the subdag error
         self.assertRaises(AirflowException, dag.run, start_date=DEFAULT_DATE, end_date=DEFAULT_DATE)


[19/45] incubator-airflow git commit: [AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively from settings

Posted by bo...@apache.org.
[AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively from settings

Closes #2013 from gsakkis/settings


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/25920242
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/25920242
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/25920242

Branch: refs/heads/v1-8-stable
Commit: 2592024230a25820d368ecc3bd43fbf7b52e46d9
Parents: 5405f5f
Author: George Sakkis <ge...@gmail.com>
Authored: Thu Feb 2 14:45:48 2017 +0100
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 07:57:04 2017 -0700

----------------------------------------------------------------------
 airflow/__init__.py                  | 8 +++-----
 airflow/bin/cli.py                   | 9 ++-------
 airflow/configuration.py             | 6 ------
 airflow/jobs.py                      | 2 +-
 airflow/migrations/env.py            | 6 ++----
 airflow/models.py                    | 8 +++-----
 airflow/operators/dagrun_operator.py | 3 +--
 airflow/utils/db.py                  | 4 +---
 airflow/www/utils.py                 | 4 +---
 airflow/www/views.py                 | 2 +-
 tests/core.py                        | 3 +--
 tests/jobs.py                        | 9 ++++-----
 12 files changed, 20 insertions(+), 44 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/__init__.py
----------------------------------------------------------------------
diff --git a/airflow/__init__.py b/airflow/__init__.py
index 1e40fe9..3daa6e2 100644
--- a/airflow/__init__.py
+++ b/airflow/__init__.py
@@ -24,19 +24,17 @@ from airflow import version
 __version__ = version.version
 
 import logging
-import os
 import sys
 
 from airflow import configuration as conf
-
+from airflow import settings
 from airflow.models import DAG
 from flask_admin import BaseView
 from importlib import import_module
 from airflow.exceptions import AirflowException
 
-DAGS_FOLDER = os.path.expanduser(conf.get('core', 'DAGS_FOLDER'))
-if DAGS_FOLDER not in sys.path:
-    sys.path.append(DAGS_FOLDER)
+if settings.DAGS_FOLDER not in sys.path:
+    sys.path.append(settings.DAGS_FOLDER)
 
 login = None
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/bin/cli.py
----------------------------------------------------------------------
diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index fbd86db..61d8707 100755
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -58,10 +58,8 @@ from airflow.www.app import cached_app
 from sqlalchemy import func
 from sqlalchemy.orm import exc
 
-DAGS_FOLDER = os.path.expanduser(conf.get('core', 'DAGS_FOLDER'))
 
 api.load_auth()
-
 api_module = import_module(conf.get('cli', 'api_client'))
 api_client = api_module.Client(api_base_url=conf.get('cli', 'endpoint_url'),
                                auth=api.api_auth.client_auth)
@@ -114,11 +112,8 @@ def setup_locations(process, pid=None, stdout=None, stderr=None, log=None):
 
 
 def process_subdir(subdir):
-    dags_folder = conf.get("core", "DAGS_FOLDER")
-    dags_folder = os.path.expanduser(dags_folder)
     if subdir:
-        if "DAGS_FOLDER" in subdir:
-            subdir = subdir.replace("DAGS_FOLDER", dags_folder)
+        subdir = subdir.replace('DAGS_FOLDER', settings.DAGS_FOLDER)
         subdir = os.path.abspath(os.path.expanduser(subdir))
         return subdir
 
@@ -1128,7 +1123,7 @@ class CLIFactory(object):
         'subdir': Arg(
             ("-sd", "--subdir"),
             "File location or directory from which to look for the dag",
-            default=DAGS_FOLDER),
+            default=settings.DAGS_FOLDER),
         'start_date': Arg(
             ("-s", "--start_date"), "Override start_date YYYY-MM-DD",
             type=parsedate),

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/configuration.py
----------------------------------------------------------------------
diff --git a/airflow/configuration.py b/airflow/configuration.py
index 404808b..6752bdb 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -828,9 +828,3 @@ as_dict.__doc__ = conf.as_dict.__doc__
 
 def set(section, option, value):  # noqa
     return conf.set(section, option, value)
-
-########################
-# convenience method to access config entries
-
-def get_dags_folder():
-    return os.path.expanduser(get('core', 'DAGS_FOLDER'))

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/jobs.py
----------------------------------------------------------------------
diff --git a/airflow/jobs.py b/airflow/jobs.py
index 3ca0070..fedad55 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -463,7 +463,7 @@ class SchedulerJob(BaseJob):
             self,
             dag_id=None,
             dag_ids=None,
-            subdir=models.DAGS_FOLDER,
+            subdir=settings.DAGS_FOLDER,
             num_runs=-1,
             file_process_interval=conf.getint('scheduler',
                                               'min_file_process_interval'),

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/migrations/env.py
----------------------------------------------------------------------
diff --git a/airflow/migrations/env.py b/airflow/migrations/env.py
index a107d6c..8d5e55e 100644
--- a/airflow/migrations/env.py
+++ b/airflow/migrations/env.py
@@ -17,7 +17,6 @@ from alembic import context
 from logging.config import fileConfig
 
 from airflow import settings
-from airflow import configuration
 from airflow.jobs import models
 
 # this is the Alembic Config object, which provides
@@ -54,10 +53,9 @@ def run_migrations_offline():
     script output.
 
     """
-    url = configuration.get('core', 'SQL_ALCHEMY_CONN')
     context.configure(
-        url=url, target_metadata=target_metadata, literal_binds=True,
-        compare_type=COMPARE_TYPE)
+        url=settings.SQL_ALCHEMY_CONN, target_metadata=target_metadata,
+        literal_binds=True, compare_type=COMPARE_TYPE)
 
     with context.begin_transaction():
         context.run_migrations()

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/models.py
----------------------------------------------------------------------
diff --git a/airflow/models.py b/airflow/models.py
index d6ab5b8..1829ff3 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -81,8 +81,6 @@ from airflow.utils.trigger_rule import TriggerRule
 
 Base = declarative_base()
 ID_LEN = 250
-SQL_ALCHEMY_CONN = configuration.get('core', 'SQL_ALCHEMY_CONN')
-DAGS_FOLDER = os.path.expanduser(configuration.get('core', 'DAGS_FOLDER'))
 XCOM_RETURN_KEY = 'return_value'
 
 Stats = settings.Stats
@@ -95,7 +93,7 @@ try:
 except:
     pass
 
-if 'mysql' in SQL_ALCHEMY_CONN:
+if 'mysql' in settings.SQL_ALCHEMY_CONN:
     LongText = LONGTEXT
 else:
     LongText = Text
@@ -165,7 +163,7 @@ class DagBag(BaseDagBag, LoggingMixin):
             executor=DEFAULT_EXECUTOR,
             include_examples=configuration.getboolean('core', 'LOAD_EXAMPLES')):
 
-        dag_folder = dag_folder or DAGS_FOLDER
+        dag_folder = dag_folder or settings.DAGS_FOLDER
         self.logger.info("Filling up the DagBag from {}".format(dag_folder))
         self.dag_folder = dag_folder
         self.dags = {}
@@ -2858,7 +2856,7 @@ class DAG(BaseDag, LoggingMixin):
         """
         File location of where the dag object is instantiated
         """
-        fn = self.full_filepath.replace(DAGS_FOLDER + '/', '')
+        fn = self.full_filepath.replace(settings.DAGS_FOLDER + '/', '')
         fn = fn.replace(os.path.dirname(__file__) + '/', '')
         return fn
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/operators/dagrun_operator.py
----------------------------------------------------------------------
diff --git a/airflow/operators/dagrun_operator.py b/airflow/operators/dagrun_operator.py
index 239ebb4..c3ffa1a 100644
--- a/airflow/operators/dagrun_operator.py
+++ b/airflow/operators/dagrun_operator.py
@@ -14,7 +14,6 @@
 
 from datetime import datetime
 import logging
-import os
 
 from airflow.models import BaseOperator, DagBag
 from airflow.utils.decorators import apply_defaults
@@ -65,7 +64,7 @@ class TriggerDagRunOperator(BaseOperator):
         dro = self.python_callable(context, dro)
         if dro:
             session = settings.Session()
-            dbag = DagBag(os.path.expanduser(conf.get('core', 'DAGS_FOLDER')))
+            dbag = DagBag(settings.DAGS_FOLDER)
             trigger_dag = dbag.get_dag(self.trigger_dag_id)
             dr = trigger_dag.create_dagrun(
                 run_id=dro.run_id,

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/utils/db.py
----------------------------------------------------------------------
diff --git a/airflow/utils/db.py b/airflow/utils/db.py
index 9c7b4b3..2502219 100644
--- a/airflow/utils/db.py
+++ b/airflow/utils/db.py
@@ -30,7 +30,6 @@ from sqlalchemy import event, exc
 from sqlalchemy.pool import Pool
 
 from airflow import settings
-from airflow import configuration
 
 
 def provide_session(func):
@@ -287,8 +286,7 @@ def upgradedb():
     directory = os.path.join(package_dir, 'migrations')
     config = Config(os.path.join(package_dir, 'alembic.ini'))
     config.set_main_option('script_location', directory)
-    config.set_main_option('sqlalchemy.url',
-                           configuration.get('core', 'SQL_ALCHEMY_CONN'))
+    config.set_main_option('sqlalchemy.url', settings.SQL_ALCHEMY_CONN)
     command.upgrade(config, 'heads')
 
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/www/utils.py
----------------------------------------------------------------------
diff --git a/airflow/www/utils.py b/airflow/www/utils.py
index 1a1229b..d2218de 100644
--- a/airflow/www/utils.py
+++ b/airflow/www/utils.py
@@ -137,9 +137,7 @@ def notify_owner(f):
         if request.args.get('confirmed') == "true":
             dag_id = request.args.get('dag_id')
             task_id = request.args.get('task_id')
-            dagbag = models.DagBag(
-                os.path.expanduser(configuration.get('core', 'DAGS_FOLDER')))
-
+            dagbag = models.DagBag(settings.DAGS_FOLDER)
             dag = dagbag.get_dag(dag_id)
             task = dag.get_task(task_id)
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/airflow/www/views.py
----------------------------------------------------------------------
diff --git a/airflow/www/views.py b/airflow/www/views.py
index 9e68079..0391775 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -77,7 +77,7 @@ from airflow.configuration import AirflowConfigException
 QUERY_LIMIT = 100000
 CHART_LIMIT = 200000
 
-dagbag = models.DagBag(os.path.expanduser(conf.get('core', 'DAGS_FOLDER')))
+dagbag = models.DagBag(settings.DAGS_FOLDER)
 
 login_required = airflow.login.login_required
 current_user = airflow.login.current_user

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/tests/core.py
----------------------------------------------------------------------
diff --git a/tests/core.py b/tests/core.py
index 3e76e81..ee7a738 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -15,7 +15,6 @@
 from __future__ import print_function
 
 import doctest
-import json
 import os
 import re
 import unittest
@@ -1315,7 +1314,7 @@ class CliTests(unittest.TestCase):
             '-s', DEFAULT_DATE.isoformat()]))
 
     def test_process_subdir_path_with_placeholder(self):
-        assert cli.process_subdir('DAGS_FOLDER/abc') == os.path.join(configuration.get_dags_folder(), 'abc')
+        assert cli.process_subdir('DAGS_FOLDER/abc') == os.path.join(settings.DAGS_FOLDER, 'abc')
 
     def test_trigger_dag(self):
         cli.trigger_dag(self.parser.parse_args([

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/25920242/tests/jobs.py
----------------------------------------------------------------------
diff --git a/tests/jobs.py b/tests/jobs.py
index ee4c8a7..44087e1 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -27,7 +27,6 @@ import sys
 from tempfile import mkdtemp
 
 from airflow import AirflowException, settings
-from airflow import models
 from airflow.bin import cli
 from airflow.jobs import BackfillJob, SchedulerJob
 from airflow.models import DAG, DagModel, DagBag, DagRun, Pool, TaskInstance as TI
@@ -817,7 +816,7 @@ class SchedulerJobTest(unittest.TestCase):
         # Recreated part of the scheduler here, to kick off tasks -> executor
         for ti_key in queue:
             task = dag.get_task(ti_key[1])
-            ti = models.TaskInstance(task, ti_key[2])
+            ti = TI(task, ti_key[2])
             # Task starts out in the scheduled state. All tasks in the
             # scheduled state will be sent to the executor
             ti.state = State.SCHEDULED
@@ -921,7 +920,7 @@ class SchedulerJobTest(unittest.TestCase):
             # try to schedule the above DAG repeatedly.
             scheduler = SchedulerJob(num_runs=1,
                                      executor=executor,
-                                     subdir=os.path.join(models.DAGS_FOLDER,
+                                     subdir=os.path.join(settings.DAGS_FOLDER,
                                                          "no_dags.py"))
             scheduler.heartrate = 0
             scheduler.run()
@@ -973,7 +972,7 @@ class SchedulerJobTest(unittest.TestCase):
             # try to schedule the above DAG repeatedly.
             scheduler = SchedulerJob(num_runs=1,
                                      executor=executor,
-                                     subdir=os.path.join(models.DAGS_FOLDER,
+                                     subdir=os.path.join(settings.DAGS_FOLDER,
                                                          "no_dags.py"))
             scheduler.heartrate = 0
             scheduler.run()
@@ -1066,7 +1065,7 @@ class SchedulerJobTest(unittest.TestCase):
 
         dag_id = 'exit_test_dag'
         dag_ids = [dag_id]
-        dag_directory = os.path.join(models.DAGS_FOLDER,
+        dag_directory = os.path.join(settings.DAGS_FOLDER,
                                      "..",
                                      "dags_with_system_exit")
         dag_file = os.path.join(dag_directory,


[17/45] incubator-airflow git commit: [AIRFLOW-365] Set dag.fileloc explicitly and use for Code view

Posted by bo...@apache.org.
[AIRFLOW-365] Set dag.fileloc explicitly and use for Code view

Code view for subdag has not been working. I do
not think we are able
cleanly figure out where the code for the factory
method lives when we
process the dags, so we need to save the location
when the subdag is
created.

Previously for a subdag, its `fileloc` attribute
would be set to the
location of the parent dag. I think it is
appropriate to instead set
it to the actual child dag location instead. We do
not lose any
information this way (we still have the link to
the parent dag that
has its location) and now we can always read this
attribute for the
code view. This should not affect the use of this
field for refreshing
dags, because we always refresh the parent for a
subdag.

Closes #2043 from dhuang/AIRFLOW-365


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a7abcf35
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a7abcf35
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a7abcf35

Branch: refs/heads/v1-8-stable
Commit: a7abcf35b0e228034f746b3d50abd0ca9bd8bede
Parents: 4db8f07
Author: Daniel Huang <dx...@gmail.com>
Authored: Thu Feb 2 13:57:20 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Sun Mar 12 07:54:02 2017 -0700

----------------------------------------------------------------------
 airflow/models.py    |  7 ++++---
 airflow/www/views.py |  5 ++---
 tests/models.py      | 18 ++++++++++++++++++
 3 files changed, 24 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a7abcf35/airflow/models.py
----------------------------------------------------------------------
diff --git a/airflow/models.py b/airflow/models.py
index 62457f0..d6ab5b8 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -29,6 +29,7 @@ import functools
 import getpass
 import imp
 import importlib
+import inspect
 import zipfile
 import jinja2
 import json
@@ -307,7 +308,6 @@ class DagBag(BaseDagBag, LoggingMixin):
                     if not dag.full_filepath:
                         dag.full_filepath = filepath
                     dag.is_subdag = False
-                    dag.module_name = m.__name__
                     self.bag_dag(dag, parent_dag=dag, root_dag=dag)
                     found_dags.append(dag)
                     found_dags += dag.subdags
@@ -367,7 +367,6 @@ class DagBag(BaseDagBag, LoggingMixin):
         for subdag in dag.subdags:
             subdag.full_filepath = dag.full_filepath
             subdag.parent_dag = dag
-            subdag.fileloc = root_dag.full_filepath
             subdag.is_subdag = True
             self.bag_dag(subdag, parent_dag=dag, root_dag=root_dag)
         self.logger.debug('Loaded DAG {dag}'.format(**locals()))
@@ -2660,6 +2659,8 @@ class DAG(BaseDag, LoggingMixin):
         self._pickle_id = None
 
         self._description = description
+        # set file location to caller source path
+        self.fileloc = inspect.getsourcefile(inspect.stack()[1][0])
         self.task_dict = dict()
         self.start_date = start_date
         self.end_date = end_date
@@ -3355,7 +3356,7 @@ class DAG(BaseDag, LoggingMixin):
             orm_dag = DagModel(dag_id=dag.dag_id)
             logging.info("Creating ORM DAG for %s",
                          dag.dag_id)
-        orm_dag.fileloc = dag.full_filepath
+        orm_dag.fileloc = dag.fileloc
         orm_dag.is_subdag = dag.is_subdag
         orm_dag.owners = owner
         orm_dag.is_active = True

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a7abcf35/airflow/www/views.py
----------------------------------------------------------------------
diff --git a/airflow/www/views.py b/airflow/www/views.py
index b98bd74..9e68079 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -18,7 +18,6 @@ from past.builtins import basestring, unicode
 import os
 import pkg_resources
 import socket
-import importlib
 from functools import wraps
 from datetime import datetime, timedelta
 import dateutil.parser
@@ -577,8 +576,8 @@ class Airflow(BaseView):
         dag = dagbag.get_dag(dag_id)
         title = dag_id
         try:
-            m = importlib.import_module(dag.module_name)
-            code = inspect.getsource(m)
+            with open(dag.fileloc, 'r') as f:
+                code = f.read()
             html_code = highlight(
                 code, lexers.PythonLexer(), HtmlFormatter(linenos=True))
         except IOError as e:

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a7abcf35/tests/models.py
----------------------------------------------------------------------
diff --git a/tests/models.py b/tests/models.py
index 867e293..7ca01e7 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -200,6 +200,24 @@ class DagBagTest(unittest.TestCase):
         assert dagbag.get_dag(dag_id) != None
         assert dagbag.process_file_calls == 1
 
+    def test_get_dag_fileloc(self):
+        """
+        Test that fileloc is correctly set when we load example DAGs,
+        specifically SubDAGs.
+        """
+        dagbag = models.DagBag(include_examples=True)
+
+        expected = {
+            'example_bash_operator': 'example_bash_operator.py',
+            'example_subdag_operator': 'example_subdag_operator.py',
+            'example_subdag_operator.section-1': 'subdags/subdag.py'
+        }
+
+        for dag_id, path in expected.items():
+            dag = dagbag.get_dag(dag_id)
+            self.assertTrue(
+                dag.fileloc.endswith('airflow/example_dags/' + path))
+
 
 class TaskInstanceTest(unittest.TestCase):
 


[20/45] incubator-airflow git commit: [AIRFLOW-831] Restore import to fix broken tests

Posted by bo...@apache.org.
[AIRFLOW-831] Restore import to fix broken tests

The global `models` object is used in the code and
was inadvertently
removed. This PR restores it

Closes #2050 from jlowin/fix-broken-tests


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/e1d0adb6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/e1d0adb6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/e1d0adb6

Branch: refs/heads/v1-8-stable
Commit: e1d0adb61d6475154ada7347ea30404f0680e779
Parents: 2592024
Author: Jeremiah Lowin <jl...@apache.org>
Authored: Thu Feb 2 11:56:22 2017 -0500
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 07:58:12 2017 -0700

----------------------------------------------------------------------
 tests/jobs.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e1d0adb6/tests/jobs.py
----------------------------------------------------------------------
diff --git a/tests/jobs.py b/tests/jobs.py
index 44087e1..e520b44 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -26,7 +26,7 @@ import six
 import sys
 from tempfile import mkdtemp
 
-from airflow import AirflowException, settings
+from airflow import AirflowException, settings, models
 from airflow.bin import cli
 from airflow.jobs import BackfillJob, SchedulerJob
 from airflow.models import DAG, DagModel, DagBag, DagRun, Pool, TaskInstance as TI


[34/45] incubator-airflow git commit: [AIRFLOW-967] Wrap strings in native for py2 ldap compatibility

Posted by bo...@apache.org.
[AIRFLOW-967] Wrap strings in native for py2 ldap compatibility

ldap3 has issues with newstr being passed. This
wraps any call
that goes over the wire to the ldap server in
native() to ensure
the native string type is used.

Closes #2141 from bolkedebruin/AIRFLOW-967


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8ffaadf1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8ffaadf1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8ffaadf1

Branch: refs/heads/v1-8-stable
Commit: 8ffaadf173e1cd46661a592ad55b0d41e460c05a
Parents: 1f3aead
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Fri Mar 10 12:00:16 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:32:02 2017 -0700

----------------------------------------------------------------------
 airflow/contrib/auth/backends/ldap_auth.py | 26 +++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8ffaadf1/airflow/contrib/auth/backends/ldap_auth.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/auth/backends/ldap_auth.py b/airflow/contrib/auth/backends/ldap_auth.py
index 24a63bc..13b49f9 100644
--- a/airflow/contrib/auth/backends/ldap_auth.py
+++ b/airflow/contrib/auth/backends/ldap_auth.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+from future.utils import native
 
 import flask_login
 from flask_login import login_required, current_user, logout_user
@@ -60,7 +61,7 @@ def get_ldap_connection(dn=None, password=None):
         pass
 
     server = Server(configuration.get("ldap", "uri"), use_ssl, tls_configuration)
-    conn = Connection(server, dn, password)
+    conn = Connection(server, native(dn), native(password))
 
     if not conn.bind():
         LOG.error("Cannot bind to ldap server: %s ", conn.last_error)
@@ -71,14 +72,15 @@ def get_ldap_connection(dn=None, password=None):
 
 def group_contains_user(conn, search_base, group_filter, user_name_attr, username):
     search_filter = '(&({0}))'.format(group_filter)
-    if not conn.search(search_base, search_filter, attributes=[user_name_attr]):
-        LOG.warn("Unable to find group for %s %s", search_base, search_filter)
+    if not conn.search(native(search_base), native(search_filter),
+                       attributes=[native(user_name_attr)]):
+        LOG.warning("Unable to find group for %s %s", search_base, search_filter)
     else:
         for resp in conn.response:
             if (
-                'attributes' in resp and (
-                    resp['attributes'].get(user_name_attr)[0] == username or
-                    resp['attributes'].get(user_name_attr) == username
+                        'attributes' in resp and (
+                            resp['attributes'].get(user_name_attr)[0] == username or
+                            resp['attributes'].get(user_name_attr) == username
                 )
             ):
                 return True
@@ -87,7 +89,7 @@ def group_contains_user(conn, search_base, group_filter, user_name_attr, usernam
 
 def groups_user(conn, search_base, user_filter, user_name_att, username):
     search_filter = "(&({0})({1}={2}))".format(user_filter, user_name_att, username)
-    res = conn.search(search_base, search_filter, attributes=["memberOf"])
+    res = conn.search(native(search_base), native(search_filter), attributes=[native("memberOf")])
     if not res:
         LOG.info("Cannot find user %s", username)
         raise AuthenticationError("Invalid username or password")
@@ -118,7 +120,8 @@ class LdapUser(models.User):
         self.ldap_groups = []
 
         # Load and cache superuser and data_profiler settings.
-        conn = get_ldap_connection(configuration.get("ldap", "bind_user"), configuration.get("ldap", "bind_password"))
+        conn = get_ldap_connection(configuration.get("ldap", "bind_user"),
+                                   configuration.get("ldap", "bind_password"))
         try:
             self.superuser = group_contains_user(conn,
                                                  configuration.get("ldap", "basedn"),
@@ -151,7 +154,8 @@ class LdapUser(models.User):
 
     @staticmethod
     def try_login(username, password):
-        conn = get_ldap_connection(configuration.get("ldap", "bind_user"), configuration.get("ldap", "bind_password"))
+        conn = get_ldap_connection(configuration.get("ldap", "bind_user"),
+                                   configuration.get("ldap", "bind_password"))
 
         search_filter = "(&({0})({1}={2}))".format(
             configuration.get("ldap", "user_filter"),
@@ -171,7 +175,9 @@ class LdapUser(models.User):
 
         # todo: BASE or ONELEVEL?
 
-        res = conn.search(configuration.get("ldap", "basedn"), search_filter, search_scope=search_scope)
+        res = conn.search(native(configuration.get("ldap", "basedn")),
+                          native(search_filter),
+                          search_scope=native(search_scope))
 
         # todo: use list or result?
         if not res:


[45/45] incubator-airflow git commit: Merge branch 'v1-8-test' into v1-8-stable

Posted by bo...@apache.org.
Merge branch 'v1-8-test' into v1-8-stable


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/f4760c32
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/f4760c32
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/f4760c32

Branch: refs/heads/v1-8-stable
Commit: f4760c320a29be62469799355e76efa42d0b6bb2
Parents: 07d40d7 2a60897
Author: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Authored: Sun Mar 12 19:56:48 2017 -0700
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 19:56:48 2017 -0700

----------------------------------------------------------------------
 .github/PULL_REQUEST_TEMPLATE.md                |   6 +-
 CHANGELOG.txt                                   |  35 +-
 airflow/__init__.py                             |   8 +-
 airflow/api/client/local_client.py              |   2 +-
 airflow/bin/cli.py                              |   9 +-
 airflow/configuration.py                        |   8 +-
 airflow/contrib/auth/backends/ldap_auth.py      |  26 +-
 airflow/contrib/hooks/__init__.py               |   1 +
 airflow/contrib/hooks/gcp_dataflow_hook.py      |  33 +-
 airflow/contrib/hooks/spark_submit_hook.py      | 226 ++++++++++
 airflow/contrib/operators/__init__.py           |   1 +
 airflow/contrib/operators/dataflow_operator.py  |  85 +++-
 .../contrib/operators/spark_submit_operator.py  | 112 +++++
 .../contrib/operators/ssh_execute_operator.py   |   2 +-
 airflow/hooks/postgres_hook.py                  |  19 +-
 airflow/jobs.py                                 | 437 ++++++++++---------
 airflow/migrations/env.py                       |   6 +-
 airflow/models.py                               |  85 +++-
 airflow/operators/dagrun_operator.py            |   3 +-
 airflow/operators/s3_to_hive_operator.py        | 151 +++++--
 airflow/operators/sensors.py                    |   4 +-
 airflow/plugins_manager.py                      |   4 +-
 airflow/ti_deps/deps/trigger_rule_dep.py        |   6 +-
 airflow/utils/compression.py                    |  38 ++
 airflow/utils/db.py                             |   8 +-
 airflow/version.py                              |   2 +-
 airflow/www/utils.py                            |   4 +-
 airflow/www/views.py                            |  30 +-
 run_unit_tests.sh                               |  36 +-
 tests/contrib/hooks/gcp_dataflow_hook.py        |  56 +++
 tests/contrib/hooks/spark_submit_hook.py        | 148 +++++++
 tests/contrib/operators/dataflow_operator.py    |  76 ++++
 .../contrib/operators/spark_submit_operator.py  |  75 ++++
 tests/core.py                                   |  28 +-
 tests/dags/test_dagrun_short_circuit_false.py   |  38 ++
 tests/dags/test_double_trigger.py               |  29 ++
 tests/dags/test_issue_1225.py                   |  13 +
 tests/executor/__init__.py                      |  13 -
 tests/executor/test_executor.py                 |  33 --
 tests/executors/__init__.py                     |  13 +
 tests/executors/test_executor.py                |  56 +++
 tests/jobs.py                                   | 218 ++++++++-
 tests/models.py                                 | 173 +++++++-
 tests/operators/__init__.py                     |   1 +
 tests/operators/s3_to_hive_operator.py          | 247 +++++++++++
 tests/operators/subdag_operator.py              |   4 +-
 tests/utils/__init__.py                         |   1 +
 tests/utils/compression.py                      |  97 ++++
 48 files changed, 2280 insertions(+), 426 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/f4760c32/airflow/version.py
----------------------------------------------------------------------
diff --cc airflow/version.py
index c280ed0,8f87df9..cdbe073
--- a/airflow/version.py
+++ b/airflow/version.py
@@@ -13,4 -13,4 +13,4 @@@
  # limitations under the License.
  #
  
- version = '1.8.0rc4+apache.incubating'
 -version = '1.8.1alpha0'
++version = '1.8.0rc5+apache.incubating'


[29/45] incubator-airflow git commit: [AIRFLOW-933] use ast.literal_eval rather eval because ast.literal_eval does not execute input.

Posted by bo...@apache.org.
[AIRFLOW-933] use ast.literal_eval rather eval because ast.literal_eval does not execute
input.

This PR addresses the following issues:
- *(https://issues.apache.org/jira/browse/AIRFLOW-
933)*

This PR is trying to solve a secure issue. The
test was done by setting up a local web server and
reproduce the issue described in JIRA link above.

Closes #2117 from amaliujia/master


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/0964f189
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/0964f189
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/0964f189

Branch: refs/heads/v1-8-stable
Commit: 0964f189f2cd2ac10150040670a542910370e456
Parents: f04ea97
Author: Rui Wang <ru...@airbnb.com>
Authored: Wed Mar 1 14:03:34 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:21:01 2017 -0700

----------------------------------------------------------------------
 airflow/www/views.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/0964f189/airflow/www/views.py
----------------------------------------------------------------------
diff --git a/airflow/www/views.py b/airflow/www/views.py
index 86b1291..d8acfef 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -44,6 +44,7 @@ from flask._compat import PY2
 import jinja2
 import markdown
 import nvd3
+import ast
 
 from wtforms import (
     Form, SelectField, TextAreaField, PasswordField, StringField, validators)
@@ -168,7 +169,7 @@ def nobr_f(v, c, m, p):
 
 def label_link(v, c, m, p):
     try:
-        default_params = eval(m.default_params)
+        default_params = ast.literal_eval(m.default_params)
     except:
         default_params = {}
     url = url_for(


[38/45] incubator-airflow git commit: [AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection

Posted by bo...@apache.org.
[AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection

Right now, a second task instance being triggered
will cause
both itself and the original task to run because
the hostname
and pid fields are updated regardless if the task
is already running.
Also, pid field is not refreshed from db properly.
Also, we should
check against parent's pid.

Will be followed up by working tests.

Closes #2102 from saguziel/aguziel-fix-trigger-2


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/1243ab16
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/1243ab16
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/1243ab16

Branch: refs/heads/v1-8-stable
Commit: 1243ab16849ab9716b26aeba6a11ea3e9e9a81ca
Parents: a8f2c27
Author: Alex Guziel <al...@airbnb.com>
Authored: Sat Mar 11 10:54:39 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:34:45 2017 -0700

----------------------------------------------------------------------
 airflow/jobs.py                 | 41 ++++++++++++++-----------
 airflow/models.py               |  2 ++
 tests/core.py                   | 59 ++++++++++++++++++++++++++++++++++++
 tests/dags/sleep_forever_dag.py | 29 ++++++++++++++++++
 4 files changed, 113 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1243ab16/airflow/jobs.py
----------------------------------------------------------------------
diff --git a/airflow/jobs.py b/airflow/jobs.py
index c61b229..222d9ba 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -2072,15 +2072,6 @@ class LocalTaskJob(BaseJob):
         try:
             self.task_runner.start()
 
-            ti = self.task_instance
-            session = settings.Session()
-            if self.task_runner.process:
-                ti.pid = self.task_runner.process.pid
-            ti.hostname = socket.getfqdn()
-            session.merge(ti)
-            session.commit()
-            session.close()
-
             last_heartbeat_time = time.time()
             heartbeat_time_limit = conf.getint('scheduler',
                                                'scheduler_zombie_task_threshold')
@@ -2120,6 +2111,18 @@ class LocalTaskJob(BaseJob):
         self.task_runner.terminate()
         self.task_runner.on_finish()
 
+    def _is_descendant_process(self, pid):
+        """Checks if pid is a descendant of the current process.
+
+        :param pid: process id to check
+        :type pid: int
+        :rtype: bool
+        """
+        try:
+            return psutil.Process(pid) in psutil.Process().children(recursive=True)
+        except psutil.NoSuchProcess:
+            return False
+
     @provide_session
     def heartbeat_callback(self, session=None):
         """Self destruct task if state has been moved away from running externally"""
@@ -2133,15 +2136,17 @@ class LocalTaskJob(BaseJob):
         if ti.state == State.RUNNING:
             self.was_running = True
             fqdn = socket.getfqdn()
-            if not (fqdn == ti.hostname and
-                    self.task_runner.process.pid == ti.pid):
-                logging.warning("Recorded hostname and pid of {ti.hostname} "
-                                "and {ti.pid} do not match this instance's "
-                                "which are {fqdn} and "
-                                "{self.task_runner.process.pid}. "
-                                "Taking the poison pill. So long."
-                                .format(**locals()))
-                raise AirflowException("Another worker/process is running this job")
+            if fqdn != ti.hostname:
+                logging.warning("The recorded hostname {ti.hostname} "
+                                "does not match this instance's hostname "
+                                "{fqdn}".format(**locals()))
+                raise AirflowException("Hostname of job runner does not match")
+            elif not self._is_descendant_process(ti.pid):
+                current_pid = os.getpid()
+                logging.warning("Recorded pid {ti.pid} is not a "
+                                "descendant of the current pid "
+                                "{current_pid}".format(**locals()))
+                raise AirflowException("PID of job runner does not match")
         elif (self.was_running
               and self.task_runner.return_code() is None
               and hasattr(self.task_runner, 'process')):

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1243ab16/airflow/models.py
----------------------------------------------------------------------
diff --git a/airflow/models.py b/airflow/models.py
index 32c52ac..7c6590f 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -997,6 +997,7 @@ class TaskInstance(Base):
             self.end_date = ti.end_date
             self.try_number = ti.try_number
             self.hostname = ti.hostname
+            self.pid = ti.pid
         else:
             self.state = None
 
@@ -1320,6 +1321,7 @@ class TaskInstance(Base):
         if not test_mode:
             session.add(Log(State.RUNNING, self))
         self.state = State.RUNNING
+        self.pid = os.getpid()
         self.end_date = None
         if not test_mode:
             session.merge(self)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1243ab16/tests/core.py
----------------------------------------------------------------------
diff --git a/tests/core.py b/tests/core.py
index ee7a738..636ad43 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -26,6 +26,7 @@ from datetime import datetime, time, timedelta
 from email.mime.multipart import MIMEMultipart
 from email.mime.application import MIMEApplication
 import signal
+from time import time as timetime
 from time import sleep
 import warnings
 
@@ -895,6 +896,64 @@ class CoreTest(unittest.TestCase):
                 trigger_rule="non_existant",
                 dag=self.dag)
 
+    def test_run_task_twice(self):
+        """If two copies of a TI run, the new one should die, and old should live"""
+        dagbag = models.DagBag(
+            dag_folder=TEST_DAG_FOLDER,
+            include_examples=False,
+        )
+        TI = models.TaskInstance
+        dag = dagbag.dags.get('sleep_forever_dag')
+        task = dag.task_dict.get('sleeps_forever')
+    
+        ti = TI(task=task, execution_date=DEFAULT_DATE)
+        job1 = jobs.LocalTaskJob(
+            task_instance=ti, ignore_ti_state=True, executor=SequentialExecutor())
+        job2 = jobs.LocalTaskJob(
+            task_instance=ti, ignore_ti_state=True, executor=SequentialExecutor())
+
+        p1 = multiprocessing.Process(target=job1.run)
+        p2 = multiprocessing.Process(target=job2.run)
+        try:
+            p1.start()
+            start_time = timetime()
+            sleep(5.0) # must wait for session to be created on p1
+            settings.engine.dispose()
+            session = settings.Session()
+            ti.refresh_from_db(session=session)
+            self.assertEqual(State.RUNNING, ti.state)
+            p1pid = ti.pid
+            settings.engine.dispose()
+            p2.start()
+            p2.join(5) # wait 5 seconds until termination
+            self.assertFalse(p2.is_alive())
+            self.assertTrue(p1.is_alive())
+
+            settings.engine.dispose()
+            session = settings.Session()
+            ti.refresh_from_db(session=session)
+            self.assertEqual(State.RUNNING, ti.state)
+            self.assertEqual(p1pid, ti.pid)
+
+            # check changing hostname kills task
+            ti.refresh_from_db(session=session, lock_for_update=True)
+            ti.hostname = 'nonexistenthostname'
+            session.merge(ti)
+            session.commit()
+
+            p1.join(5)
+            self.assertFalse(p1.is_alive())
+        finally:
+            try:
+                p1.terminate()
+            except AttributeError:
+                pass # process already terminated
+            try:
+                p2.terminate()
+            except AttributeError:
+                pass # process already terminated
+            session.close()
+
     def test_terminate_task(self):
         """If a task instance's db state get deleted, it should fail"""
         TI = models.TaskInstance

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1243ab16/tests/dags/sleep_forever_dag.py
----------------------------------------------------------------------
diff --git a/tests/dags/sleep_forever_dag.py b/tests/dags/sleep_forever_dag.py
new file mode 100644
index 0000000..b1f810e
--- /dev/null
+++ b/tests/dags/sleep_forever_dag.py
@@ -0,0 +1,29 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Used for unit tests"""
+import airflow
+from airflow.operators.bash_operator import BashOperator
+from airflow.models import DAG
+
+dag = DAG(
+    dag_id='sleep_forever_dag',
+    schedule_interval=None,
+)
+
+task = BashOperator(
+    task_id='sleeps_forever',
+    dag=dag,
+    bash_command="sleep 10000000000",
+    start_date=airflow.utils.dates.days_ago(2),
+    owner='airflow')


[18/45] incubator-airflow git commit: [AIRFLOW-694] Fix config behaviour for empty envvar

Posted by bo...@apache.org.
[AIRFLOW-694] Fix config behaviour for empty envvar

Currently, environment variable with empty value
does not overwrite the
configuration value corresponding to it.

Closes #2044 from sekikn/AIRFLOW-694


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/5405f5f8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/5405f5f8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/5405f5f8

Branch: refs/heads/v1-8-stable
Commit: 5405f5f83c6e20fff2dc209cd4be3d1d5ea85140
Parents: a7abcf3
Author: Kengo Seki <se...@nttdata.co.jp>
Authored: Thu Feb 2 14:38:29 2017 +0100
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 07:56:15 2017 -0700

----------------------------------------------------------------------
 airflow/configuration.py |  2 +-
 tests/core.py            | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/5405f5f8/airflow/configuration.py
----------------------------------------------------------------------
diff --git a/airflow/configuration.py b/airflow/configuration.py
index 011f764..404808b 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -591,7 +591,7 @@ class AirflowConfigParser(ConfigParser):
 
         # first check environment variables
         option = self._get_env_var_option(section, key)
-        if option:
+        if option is not None:
             return option
 
         # ...then the config file

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/5405f5f8/tests/core.py
----------------------------------------------------------------------
diff --git a/tests/core.py b/tests/core.py
index e35809d..3e76e81 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -776,6 +776,30 @@ class CoreTest(unittest.TestCase):
         configuration.set("core", "FERNET_KEY", FERNET_KEY)
         assert configuration.has_option("core", "FERNET_KEY")
 
+    def test_config_override_original_when_non_empty_envvar_is_provided(self):
+        key = "AIRFLOW__CORE__FERNET_KEY"
+        value = "some value"
+        assert key not in os.environ
+
+        os.environ[key] = value
+        FERNET_KEY = configuration.get('core', 'FERNET_KEY')
+        assert FERNET_KEY == value
+
+        # restore the envvar back to the original state
+        del os.environ[key]
+
+    def test_config_override_original_when_empty_envvar_is_provided(self):
+        key = "AIRFLOW__CORE__FERNET_KEY"
+        value = ""
+        assert key not in os.environ
+
+        os.environ[key] = value
+        FERNET_KEY = configuration.get('core', 'FERNET_KEY')
+        assert FERNET_KEY == value
+
+        # restore the envvar back to the original state
+        del os.environ[key]
+
     def test_class_with_logger_should_have_logger_with_correct_name(self):
 
         # each class should automatically receive a logger with a correct name


[26/45] incubator-airflow git commit: [AIRFLOW-802][AIRFLOW-1] Add spark-submit operator/hook

Posted by bo...@apache.org.
[AIRFLOW-802][AIRFLOW-1] Add spark-submit operator/hook

Add a operator for spark-submit to kick off Apache
Spark jobs by
using Airflow. This allows the user to maintain
the configuration
of the master and yarn queue within Airflow by
using connections.
Add default connection_id to the initdb routine to
set spark
to yarn by default. Add unit tests to verify the
behaviour of
the spark-submit operator and hook.

Closes #2042 from Fokko/airflow-802


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/01494fd4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/01494fd4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/01494fd4

Branch: refs/heads/v1-8-stable
Commit: 01494fd4c0633dbb57f231ee17e015f42a5ecf24
Parents: c29af46
Author: Fokko Driesprong <fo...@godatadriven.com>
Authored: Mon Feb 27 13:45:24 2017 +0100
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:19:37 2017 -0700

----------------------------------------------------------------------
 airflow/contrib/hooks/__init__.py               |   1 +
 airflow/contrib/hooks/spark_submit_hook.py      | 226 +++++++++++++++++++
 airflow/contrib/operators/__init__.py           |   1 +
 .../contrib/operators/spark_submit_operator.py  | 112 +++++++++
 airflow/utils/db.py                             |   4 +
 tests/contrib/hooks/spark_submit_hook.py        | 148 ++++++++++++
 .../contrib/operators/spark_submit_operator.py  |  75 ++++++
 7 files changed, 567 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/01494fd4/airflow/contrib/hooks/__init__.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/hooks/__init__.py b/airflow/contrib/hooks/__init__.py
index a16a3f7..19fc2b4 100644
--- a/airflow/contrib/hooks/__init__.py
+++ b/airflow/contrib/hooks/__init__.py
@@ -42,6 +42,7 @@ _hooks = {
     'datastore_hook': ['DatastoreHook'],
     'gcp_dataproc_hook': ['DataProcHook'],
     'gcp_dataflow_hook': ['DataFlowHook'],
+    'spark_submit_operator': ['SparkSubmitOperator'],
     'cloudant_hook': ['CloudantHook'],
     'fs_hook': ['FSHook']
 }

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/01494fd4/airflow/contrib/hooks/spark_submit_hook.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/hooks/spark_submit_hook.py b/airflow/contrib/hooks/spark_submit_hook.py
new file mode 100644
index 0000000..619cc71
--- /dev/null
+++ b/airflow/contrib/hooks/spark_submit_hook.py
@@ -0,0 +1,226 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+import logging
+import subprocess
+import re
+
+from airflow.hooks.base_hook import BaseHook
+from airflow.exceptions import AirflowException
+
+log = logging.getLogger(__name__)
+
+
+class SparkSubmitHook(BaseHook):
+    """
+    This hook is a wrapper around the spark-submit binary to kick off a spark-submit job.
+    It requires that the "spark-submit" binary is in the PATH.
+    :param conf: Arbitrary Spark configuration properties
+    :type conf: dict
+    :param conn_id: The connection id as configured in Airflow administration. When an
+                    invalid connection_id is supplied, it will default to yarn.
+    :type conn_id: str
+    :param files: Upload additional files to the container running the job, separated by a
+                  comma. For example hive-site.xml.
+    :type files: str
+    :param py_files: Additional python files used by the job, can be .zip, .egg or .py.
+    :type py_files: str
+    :param jars: Submit additional jars to upload and place them in executor classpath.
+    :type jars: str
+    :param executor_cores: Number of cores per executor (Default: 2)
+    :type executor_cores: int
+    :param executor_memory: Memory per executor (e.g. 1000M, 2G) (Default: 1G)
+    :type executor_memory: str
+    :param keytab: Full path to the file that contains the keytab
+    :type keytab: str
+    :param principal: The name of the kerberos principal used for keytab
+    :type principal: str
+    :param name: Name of the job (default airflow-spark)
+    :type name: str
+    :param num_executors: Number of executors to launch
+    :type num_executors: int
+    :param verbose: Whether to pass the verbose flag to spark-submit process for debugging
+    :type verbose: bool
+    """
+
+    def __init__(self,
+                 conf=None,
+                 conn_id='spark_default',
+                 files=None,
+                 py_files=None,
+                 jars=None,
+                 executor_cores=None,
+                 executor_memory=None,
+                 keytab=None,
+                 principal=None,
+                 name='default-name',
+                 num_executors=None,
+                 verbose=False):
+        self._conf = conf
+        self._conn_id = conn_id
+        self._files = files
+        self._py_files = py_files
+        self._jars = jars
+        self._executor_cores = executor_cores
+        self._executor_memory = executor_memory
+        self._keytab = keytab
+        self._principal = principal
+        self._name = name
+        self._num_executors = num_executors
+        self._verbose = verbose
+        self._sp = None
+        self._yarn_application_id = None
+
+        (self._master, self._queue, self._deploy_mode) = self._resolve_connection()
+        self._is_yarn = 'yarn' in self._master
+
+    def _resolve_connection(self):
+        # Build from connection master or default to yarn if not available
+        master = 'yarn'
+        queue = None
+        deploy_mode = None
+
+        try:
+            # Master can be local, yarn, spark://HOST:PORT or mesos://HOST:PORT
+            conn = self.get_connection(self._conn_id)
+            if conn.port:
+                master = "{}:{}".format(conn.host, conn.port)
+            else:
+                master = conn.host
+
+            # Determine optional yarn queue from the extra field
+            extra = conn.extra_dejson
+            if 'queue' in extra:
+                queue = extra['queue']
+            if 'deploy-mode' in extra:
+                deploy_mode = extra['deploy-mode']
+        except AirflowException:
+            logging.debug(
+                "Could not load connection string {}, defaulting to {}".format(
+                    self._conn_id, master
+                )
+            )
+
+        return master, queue, deploy_mode
+
+    def get_conn(self):
+        pass
+
+    def _build_command(self, application):
+        """
+        Construct the spark-submit command to execute.
+        :param application: command to append to the spark-submit command
+        :type application: str
+        :return: full command to be executed
+        """
+        # The spark-submit binary needs to be in the path
+        connection_cmd = ["spark-submit"]
+
+        # The url ot the spark master
+        connection_cmd += ["--master", self._master]
+
+        if self._conf:
+            for key in self._conf:
+                connection_cmd += ["--conf", "{}={}".format(key, str(self._conf[key]))]
+        if self._files:
+            connection_cmd += ["--files", self._files]
+        if self._py_files:
+            connection_cmd += ["--py-files", self._py_files]
+        if self._jars:
+            connection_cmd += ["--jars", self._jars]
+        if self._num_executors:
+            connection_cmd += ["--num-executors", str(self._num_executors)]
+        if self._executor_cores:
+            connection_cmd += ["--executor-cores", str(self._executor_cores)]
+        if self._executor_memory:
+            connection_cmd += ["--executor-memory", self._executor_memory]
+        if self._keytab:
+            connection_cmd += ["--keytab", self._keytab]
+        if self._principal:
+            connection_cmd += ["--principal", self._principal]
+        if self._name:
+            connection_cmd += ["--name", self._name]
+        if self._verbose:
+            connection_cmd += ["--verbose"]
+        if self._queue:
+            connection_cmd += ["--queue", self._queue]
+        if self._deploy_mode:
+            connection_cmd += ["--deploy-mode", self._deploy_mode]
+
+        # The actual script to execute
+        connection_cmd += [application]
+
+        logging.debug("Spark-Submit cmd: {}".format(connection_cmd))
+
+        return connection_cmd
+
+    def submit(self, application="", **kwargs):
+        """
+        Remote Popen to execute the spark-submit job
+
+        :param application: Submitted application, jar or py file
+        :type application: str
+        :param kwargs: extra arguments to Popen (see subprocess.Popen)
+        """
+        spark_submit_cmd = self._build_command(application)
+        self._sp = subprocess.Popen(spark_submit_cmd,
+                                    stdout=subprocess.PIPE,
+                                    stderr=subprocess.PIPE,
+                                    **kwargs)
+
+        # Using two iterators here to support 'real-time' logging
+        sources = [self._sp.stdout, self._sp.stderr]
+
+        for source in sources:
+            self._process_log(iter(source.readline, b''))
+
+        output, stderr = self._sp.communicate()
+
+        if self._sp.returncode:
+            raise AirflowException(
+                "Cannot execute: {}. Error code is: {}. Output: {}, Stderr: {}".format(
+                    spark_submit_cmd, self._sp.returncode, output, stderr
+                )
+            )
+
+    def _process_log(self, itr):
+        """
+        Processes the log files and extracts useful information out of it
+
+        :param itr: An iterator which iterates over the input of the subprocess
+        """
+        # Consume the iterator
+        for line in itr:
+            line = line.decode('utf-8').strip()
+            # If we run yarn cluster mode, we want to extract the application id from
+            # the logs so we can kill the application when we stop it unexpectedly
+            if self._is_yarn and self._deploy_mode == 'cluster':
+                match = re.search('(application[0-9_]+)', line)
+                if match:
+                    self._yarn_application_id = match.groups()[0]
+
+            # Pass to logging
+            logging.info(line)
+
+    def on_kill(self):
+        if self._sp and self._sp.poll() is None:
+            logging.info('Sending kill signal to spark-submit')
+            self.sp.kill()
+
+            if self._yarn_application_id:
+                logging.info('Killing application on YARN')
+                yarn_kill = Popen("yarn application -kill {0}".format(self._yarn_application_id),
+                                  stdout=subprocess.PIPE,
+                                  stderr=subprocess.PIPE)
+                logging.info("YARN killed with return code: {0}".format(yarn_kill.wait()))

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/01494fd4/airflow/contrib/operators/__init__.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/operators/__init__.py b/airflow/contrib/operators/__init__.py
index ae481ea..bef3433 100644
--- a/airflow/contrib/operators/__init__.py
+++ b/airflow/contrib/operators/__init__.py
@@ -36,6 +36,7 @@ _operators = {
     'vertica_operator': ['VerticaOperator'],
     'vertica_to_hive': ['VerticaToHiveTransfer'],
     'qubole_operator': ['QuboleOperator'],
+    'spark_submit_operator': ['SparkSubmitOperator'],
     'fs_operator': ['FileSensor']
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/01494fd4/airflow/contrib/operators/spark_submit_operator.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/operators/spark_submit_operator.py b/airflow/contrib/operators/spark_submit_operator.py
new file mode 100644
index 0000000..a5e6145
--- /dev/null
+++ b/airflow/contrib/operators/spark_submit_operator.py
@@ -0,0 +1,112 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+import logging
+
+from airflow.contrib.hooks.spark_submit_hook import SparkSubmitHook
+from airflow.models import BaseOperator
+from airflow.utils.decorators import apply_defaults
+
+log = logging.getLogger(__name__)
+
+
+class SparkSubmitOperator(BaseOperator):
+    """
+    This hook is a wrapper around the spark-submit binary to kick off a spark-submit job.
+    It requires that the "spark-submit" binary is in the PATH.
+    :param application: The application that submitted as a job, either jar or py file.
+    :type application: str
+    :param conf: Arbitrary Spark configuration properties
+    :type conf: dict
+    :param conn_id: The connection id as configured in Airflow administration. When an
+                    invalid connection_id is supplied, it will default to yarn.
+    :type conn_id: str
+    :param files: Upload additional files to the container running the job, separated by a
+                  comma. For example hive-site.xml.
+    :type files: str
+    :param py_files: Additional python files used by the job, can be .zip, .egg or .py.
+    :type py_files: str
+    :param jars: Submit additional jars to upload and place them in executor classpath.
+    :type jars: str
+    :param executor_cores: Number of cores per executor (Default: 2)
+    :type executor_cores: int
+    :param executor_memory: Memory per executor (e.g. 1000M, 2G) (Default: 1G)
+    :type executor_memory: str
+    :param keytab: Full path to the file that contains the keytab
+    :type keytab: str
+    :param principal: The name of the kerberos principal used for keytab
+    :type principal: str
+    :param name: Name of the job (default airflow-spark)
+    :type name: str
+    :param num_executors: Number of executors to launch
+    :type num_executors: int
+    :param verbose: Whether to pass the verbose flag to spark-submit process for debugging
+    :type verbose: bool
+    """
+
+    @apply_defaults
+    def __init__(self,
+                 application='',
+                 conf=None,
+                 conn_id='spark_default',
+                 files=None,
+                 py_files=None,
+                 jars=None,
+                 executor_cores=None,
+                 executor_memory=None,
+                 keytab=None,
+                 principal=None,
+                 name='airflow-spark',
+                 num_executors=None,
+                 verbose=False,
+                 *args,
+                 **kwargs):
+        super(SparkSubmitOperator, self).__init__(*args, **kwargs)
+        self._application = application
+        self._conf = conf
+        self._files = files
+        self._py_files = py_files
+        self._jars = jars
+        self._executor_cores = executor_cores
+        self._executor_memory = executor_memory
+        self._keytab = keytab
+        self._principal = principal
+        self._name = name
+        self._num_executors = num_executors
+        self._verbose = verbose
+        self._hook = None
+        self._conn_id = conn_id
+
+    def execute(self, context):
+        """
+        Call the SparkSubmitHook to run the provided spark job
+        """
+        self._hook = SparkSubmitHook(
+            conf=self._conf,
+            conn_id=self._conn_id,
+            files=self._files,
+            py_files=self._py_files,
+            jars=self._jars,
+            executor_cores=self._executor_cores,
+            executor_memory=self._executor_memory,
+            keytab=self._keytab,
+            principal=self._principal,
+            name=self._name,
+            num_executors=self._num_executors,
+            verbose=self._verbose
+        )
+        self._hook.submit(self._application)
+
+    def on_kill(self):
+        self._hook.on_kill()

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/01494fd4/airflow/utils/db.py
----------------------------------------------------------------------
diff --git a/airflow/utils/db.py b/airflow/utils/db.py
index 2502219..977a949 100644
--- a/airflow/utils/db.py
+++ b/airflow/utils/db.py
@@ -192,6 +192,10 @@ def initdb():
             extra='{"region_name": "us-east-1"}'))
     merge_conn(
         models.Connection(
+            conn_id='spark_default', conn_type='spark',
+            host='yarn', extra='{"queue": "root.default"}'))
+    merge_conn(
+        models.Connection(
             conn_id='emr_default', conn_type='emr',
             extra='''
                 {   "Name": "default_job_flow_name",

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/01494fd4/tests/contrib/hooks/spark_submit_hook.py
----------------------------------------------------------------------
diff --git a/tests/contrib/hooks/spark_submit_hook.py b/tests/contrib/hooks/spark_submit_hook.py
new file mode 100644
index 0000000..b18925a
--- /dev/null
+++ b/tests/contrib/hooks/spark_submit_hook.py
@@ -0,0 +1,148 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+
+from airflow import configuration, models
+from airflow.utils import db
+from airflow.exceptions import AirflowException
+from airflow.contrib.hooks.spark_submit_hook import SparkSubmitHook
+
+
+class TestSparkSubmitHook(unittest.TestCase):
+    _spark_job_file = 'test_application.py'
+    _config = {
+        'conf': {
+            'parquet.compression': 'SNAPPY'
+        },
+        'conn_id': 'default_spark',
+        'files': 'hive-site.xml',
+        'py_files': 'sample_library.py',
+        'jars': 'parquet.jar',
+        'executor_cores': 4,
+        'executor_memory': '22g',
+        'keytab': 'privileged_user.keytab',
+        'principal': 'user/spark@airflow.org',
+        'name': 'spark-job',
+        'num_executors': 10,
+        'verbose': True
+    }
+
+    def setUp(self):
+        configuration.load_test_config()
+        db.merge_conn(
+            models.Connection(
+                conn_id='spark_yarn_cluster', conn_type='spark',
+                host='yarn://yarn-mater', extra='{"queue": "root.etl", "deploy-mode": "cluster"}')
+        )
+        db.merge_conn(
+            models.Connection(
+                conn_id='spark_default_mesos', conn_type='spark',
+                host='mesos://host', port=5050)
+        )
+
+    def test_build_command(self):
+        hook = SparkSubmitHook(**self._config)
+
+        # The subprocess requires an array but we build the cmd by joining on a space
+        cmd = ' '.join(hook._build_command(self._spark_job_file))
+
+        # Check if the URL gets build properly and everything exists.
+        assert self._spark_job_file in cmd
+
+        # Check all the parameters
+        assert "--files {}".format(self._config['files']) in cmd
+        assert "--py-files {}".format(self._config['py_files']) in cmd
+        assert "--jars {}".format(self._config['jars']) in cmd
+        assert "--executor-cores {}".format(self._config['executor_cores']) in cmd
+        assert "--executor-memory {}".format(self._config['executor_memory']) in cmd
+        assert "--keytab {}".format(self._config['keytab']) in cmd
+        assert "--principal {}".format(self._config['principal']) in cmd
+        assert "--name {}".format(self._config['name']) in cmd
+        assert "--num-executors {}".format(self._config['num_executors']) in cmd
+
+        # Check if all config settings are there
+        for k in self._config['conf']:
+            assert "--conf {0}={1}".format(k, self._config['conf'][k]) in cmd
+
+        if self._config['verbose']:
+            assert "--verbose" in cmd
+
+    def test_submit(self):
+        hook = SparkSubmitHook()
+
+        # We don't have spark-submit available, and this is hard to mock, so just accept
+        # an exception for now.
+        with self.assertRaises(AirflowException):
+            hook.submit(self._spark_job_file)
+
+    def test_resolve_connection(self):
+
+        # Default to the standard yarn connection because conn_id does not exists
+        hook = SparkSubmitHook(conn_id='')
+        self.assertEqual(hook._resolve_connection(), ('yarn', None, None))
+        assert "--master yarn" in ' '.join(hook._build_command(self._spark_job_file))
+
+        # Default to the standard yarn connection
+        hook = SparkSubmitHook(conn_id='spark_default')
+        self.assertEqual(
+            hook._resolve_connection(),
+            ('yarn', 'root.default', None)
+        )
+        cmd = ' '.join(hook._build_command(self._spark_job_file))
+        assert "--master yarn" in cmd
+        assert "--queue root.default" in cmd
+
+        # Connect to a mesos master
+        hook = SparkSubmitHook(conn_id='spark_default_mesos')
+        self.assertEqual(
+            hook._resolve_connection(),
+            ('mesos://host:5050', None, None)
+        )
+
+        cmd = ' '.join(hook._build_command(self._spark_job_file))
+        assert "--master mesos://host:5050" in cmd
+
+        # Set specific queue and deploy mode
+        hook = SparkSubmitHook(conn_id='spark_yarn_cluster')
+        self.assertEqual(
+            hook._resolve_connection(),
+            ('yarn://yarn-master', 'root.etl', 'cluster')
+        )
+
+        cmd = ' '.join(hook._build_command(self._spark_job_file))
+        assert "--master yarn://yarn-master" in cmd
+        assert "--queue root.etl" in cmd
+        assert "--deploy-mode cluster" in cmd
+
+    def test_process_log(self):
+        # Must select yarn connection
+        hook = SparkSubmitHook(conn_id='spark_yarn_cluster')
+
+        log_lines = [
+            'SPARK_MAJOR_VERSION is set to 2, using Spark2',
+            'WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable',
+            'WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.',
+            'INFO Client: Requesting a new application from cluster with 10 NodeManagers',
+            'INFO Client: Submitting application application_1486558679801_1820 to ResourceManager'
+        ]
+
+        hook._process_log(log_lines)
+
+        assert hook._yarn_application_id == 'application_1486558679801_1820'
+
+
+if __name__ == '__main__':
+    unittest.main()

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/01494fd4/tests/contrib/operators/spark_submit_operator.py
----------------------------------------------------------------------
diff --git a/tests/contrib/operators/spark_submit_operator.py b/tests/contrib/operators/spark_submit_operator.py
new file mode 100644
index 0000000..c080f76
--- /dev/null
+++ b/tests/contrib/operators/spark_submit_operator.py
@@ -0,0 +1,75 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+import datetime
+
+from airflow import DAG, configuration
+from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
+
+DEFAULT_DATE = datetime.datetime(2017, 1, 1)
+
+
+class TestSparkSubmitOperator(unittest.TestCase):
+    _config = {
+        'conf': {
+            'parquet.compression': 'SNAPPY'
+        },
+        'files': 'hive-site.xml',
+        'py_files': 'sample_library.py',
+        'jars': 'parquet.jar',
+        'executor_cores': 4,
+        'executor_memory': '22g',
+        'keytab': 'privileged_user.keytab',
+        'principal': 'user/spark@airflow.org',
+        'name': 'spark-job',
+        'num_executors': 10,
+        'verbose': True,
+        'application': 'test_application.py'
+    }
+
+    def setUp(self):
+        configuration.load_test_config()
+        args = {
+            'owner': 'airflow',
+            'start_date': DEFAULT_DATE
+        }
+        self.dag = DAG('test_dag_id', default_args=args)
+
+    def test_execute(self, conn_id='spark_default'):
+        operator = SparkSubmitOperator(
+            task_id='spark_submit_job',
+            dag=self.dag,
+            **self._config
+        )
+
+        self.assertEqual(conn_id, operator._conn_id)
+
+        self.assertEqual(self._config['application'], operator._application)
+        self.assertEqual(self._config['conf'], operator._conf)
+        self.assertEqual(self._config['files'], operator._files)
+        self.assertEqual(self._config['py_files'], operator._py_files)
+        self.assertEqual(self._config['jars'], operator._jars)
+        self.assertEqual(self._config['executor_cores'], operator._executor_cores)
+        self.assertEqual(self._config['executor_memory'], operator._executor_memory)
+        self.assertEqual(self._config['keytab'], operator._keytab)
+        self.assertEqual(self._config['principal'], operator._principal)
+        self.assertEqual(self._config['name'], operator._name)
+        self.assertEqual(self._config['num_executors'], operator._num_executors)
+        self.assertEqual(self._config['verbose'], operator._verbose)
+
+
+if __name__ == '__main__':
+    unittest.main()


[43/45] incubator-airflow git commit: Fix postgres hook

Posted by bo...@apache.org.
Fix postgres hook


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/f171d17e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/f171d17e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/f171d17e

Branch: refs/heads/v1-8-stable
Commit: f171d17e8b5ef698f487bed8a40c6dd21ed81b51
Parents: 3927e00
Author: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Authored: Sun Mar 12 10:34:19 2017 -0700
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 10:34:19 2017 -0700

----------------------------------------------------------------------
 airflow/hooks/postgres_hook.py | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/f171d17e/airflow/hooks/postgres_hook.py
----------------------------------------------------------------------
diff --git a/airflow/hooks/postgres_hook.py b/airflow/hooks/postgres_hook.py
index 750ebbb..584930d 100644
--- a/airflow/hooks/postgres_hook.py
+++ b/airflow/hooks/postgres_hook.py
@@ -28,6 +28,10 @@ class PostgresHook(DbApiHook):
     default_conn_name = 'postgres_default'
     supports_autocommit = True
 
+    def __init__(self, *args, **kwargs):
+        super(PostgresHook, self).__init__(*args, **kwargs)
+        self.schema = kwargs.pop("schema", None)
+
     def get_conn(self):
         conn = self.get_connection(self.postgres_conn_id)
         conn_args = dict(


[24/45] incubator-airflow git commit: [AIRFLOW-861] make pickle_info endpoint be login_required

Posted by bo...@apache.org.
[AIRFLOW-861] make pickle_info endpoint be login_required

Testing Done:
- Unittests pass

Closes #2077 from saguziel/aguziel-fix-login-
required


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/ff0fa00d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/ff0fa00d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/ff0fa00d

Branch: refs/heads/v1-8-stable
Commit: ff0fa00d82bfebbe9b2b9ff957e4d77db0891e7f
Parents: 1017008
Author: Alex Guziel <al...@airbnb.com>
Authored: Fri Feb 17 11:45:45 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:14:26 2017 -0700

----------------------------------------------------------------------
 airflow/www/views.py | 1 +
 1 file changed, 1 insertion(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ff0fa00d/airflow/www/views.py
----------------------------------------------------------------------
diff --git a/airflow/www/views.py b/airflow/www/views.py
index 0391775..bda4921 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -640,6 +640,7 @@ class Airflow(BaseView):
         return wwwutils.json_response(d)
 
     @expose('/pickle_info')
+    @login_required
     def pickle_info(self):
         d = {}
         dag_id = request.args.get('dag_id')


[05/45] incubator-airflow git commit: Add pool upgrade issue description

Posted by bo...@apache.org.
Add pool upgrade issue description

(cherry picked from commit e63cb1fced9517397b7db9e2849bf01fcca63902)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b3d4e711
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b3d4e711
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b3d4e711

Branch: refs/heads/v1-8-stable
Commit: b3d4e7114fd7f1943aee2e5f865cf27cffedd0ee
Parents: adaebc2
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Thu Feb 9 16:10:17 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Thu Feb 9 16:12:32 2017 +0100

----------------------------------------------------------------------
 UPDATING.md | 6 ++++++
 1 file changed, 6 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b3d4e711/UPDATING.md
----------------------------------------------------------------------
diff --git a/UPDATING.md b/UPDATING.md
index 337b711..b56aca8 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -14,6 +14,12 @@ Systemd unit files have been updated. If you use systemd please make sure to upd
 
 > Please note that the webserver does not detach properly, this will be fixed in a future version.
 
+### Tasks not starting although dependencies are met due to stricter pool checking
+Airflow 1.7.1 has issues with being able to over subscribe to a pool, ie. more slots could be used than were
+available. This is fixed in Airflow 1.8.0, but due to past issue jobs may fail to start although their
+dependencies are met after an upgrade. To workaround either temporarily increase the amount of slots above
+the the amount of queued tasks or use a new pool.
+
 ### Less forgiving scheduler on dynamic start_date
 Using a dynamic start_date (e.g. `start_date = datetime.now()`) is not considered a best practice. The 1.8.0 scheduler
 is less forgiving in this area. If you encounter DAGs not being scheduled you can try using a fixed start_date and


[28/45] incubator-airflow git commit: [AIRFLOW-925] Revert airflow.hooks change that cherry-pick picked

Posted by bo...@apache.org.
[AIRFLOW-925] Revert airflow.hooks change that cherry-pick picked

Please accept this PR that addresses the following
issues:
-
https://issues.apache.org/jira/browse/AIRFLOW-925

Testing Done:
- Fixes bug in prod

Closes #2112 from saguziel/aguziel-
hivemetastorehook-import-apache


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/f04ea97d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/f04ea97d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/f04ea97d

Branch: refs/heads/v1-8-stable
Commit: f04ea97d066093abf898fec81f96eeb4b82eaf13
Parents: ab37f8d
Author: Li Xuanji <xu...@airbnb.com>
Authored: Tue Feb 28 12:17:33 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:20:19 2017 -0700

----------------------------------------------------------------------
 airflow/operators/sensors.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/f04ea97d/airflow/operators/sensors.py
----------------------------------------------------------------------
diff --git a/airflow/operators/sensors.py b/airflow/operators/sensors.py
index 5fbd21c..c0aba27 100644
--- a/airflow/operators/sensors.py
+++ b/airflow/operators/sensors.py
@@ -300,7 +300,7 @@ class NamedHivePartitionSensor(BaseSensorOperator):
     def poke(self, context):
 
         if not hasattr(self, 'hook'):
-            self.hook = airflow.hooks.hive_hooks.HiveMetastoreHook(
+            self.hook = hooks.HiveMetastoreHook(
                 metastore_conn_id=self.metastore_conn_id)
 
         def poke_partition(partition):
@@ -369,7 +369,7 @@ class HivePartitionSensor(BaseSensorOperator):
             'Poking for table {self.schema}.{self.table}, '
             'partition {self.partition}'.format(**locals()))
         if not hasattr(self, 'hook'):
-            self.hook = airflow.hooks.hive_hooks.HiveMetastoreHook(
+            self.hook = hooks.HiveMetastoreHook(
                 metastore_conn_id=self.metastore_conn_id)
         return self.hook.check_for_partition(
             self.schema, self.table, self.partition)


[44/45] incubator-airflow git commit: Update changelog for 1.8.0

Posted by bo...@apache.org.
Update changelog for 1.8.0


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/2a608972
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/2a608972
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/2a608972

Branch: refs/heads/v1-8-stable
Commit: 2a6089728841e1f4bb060345b5c251b3ff73d13d
Parents: f171d17
Author: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Authored: Sun Mar 12 19:48:04 2017 -0700
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 19:48:04 2017 -0700

----------------------------------------------------------------------
 CHANGELOG.txt | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2a608972/CHANGELOG.txt
----------------------------------------------------------------------
diff --git a/CHANGELOG.txt b/CHANGELOG.txt
index 8da887c..5048128 100644
--- a/CHANGELOG.txt
+++ b/CHANGELOG.txt
@@ -1,6 +1,39 @@
-AIRFLOW 1.8.0, 2017-02-02
+AIRFLOW 1.8.0, 2017-03-12
 -------------------------
 
+[AIRFLOW-900] Double trigger should not kill original task instance
+[AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection
+[AIRFLOW-932] Do not mark tasks removed when backfilling
+[AIRFLOW-961] run onkill when SIGTERMed
+[AIRFLOW-910] Use parallel task execution for backfills
+[AIRFLOW-967] Wrap strings in native for py2 ldap compatibility
+[AIRFLOW-941] Use defined parameters for psycopg2
+[AIRFLOW-719] Prevent DAGs from ending prematurely
+[AIRFLOW-938] Use test for True in task_stats queries
+[AIRFLOW-937] Improve performance of task_stats
+[AIRFLOW-933] use ast.literal_eval rather eval because ast.literal_eval does not execute input.
+[AIRFLOW-925] Revert airflow.hooks change that cherry-pick picked
+[AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs UI
+[AIRFLOW-802] Add spark-submit operator/hook
+[AIRFLOW-897] Prevent dagruns from failing with unfinished tasks
+[AIRFLOW-861] make pickle_info endpoint be login_required
+[AIRFLOW-853] use utf8 encoding for stdout line decode
+[AIRFLOW-856] Make sure execution date is set for local client
+[AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity
+[AIRFLOW-831] Restore import to fix broken tests
+[AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively from settings
+[AIRFLOW-694] Fix config behaviour for empty envvar
+[AIRFLOW-365] Set dag.fileloc explicitly and use for Code view
+[AIRFLOW-931] Do not set QUEUED in TaskInstances
+[AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI instead of black
+[AIRFLOW-895] Address Apache release incompliancies
+[AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has no start date
+[AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer
+[AIRFLOW-863] Example DAGs should have recent start dates
+[AIRFLOW-869] Refactor mark success functionality
+[AIRFLOW-856] Make sure execution date is set for local client
+[AIRFLOW-814] Fix Presto*CheckOperator.__init__
+[AIRFLOW-844] Fix cgroups directory creation
 [AIRFLOW-816] Use static nvd3 and d3
 [AIRFLOW-821] Fix py3 compatibility
 [AIRFLOW-817] Check for None value of execution_date in endpoint


[11/45] incubator-airflow git commit: [AIRFLOW-863] Example DAGs should have recent start dates

Posted by bo...@apache.org.
[AIRFLOW-863] Example DAGs should have recent start dates

Avoid unnecessary backfills by having start dates
of
just a few days ago. Adds a utility function
airflow.utils.dates.days_ago().

Closes #2068 from jlowin/example-start-date

(cherry picked from commit bbfd43df4663547abda4ac6fdc3a6ed730a75b57)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3658bf31
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3658bf31
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3658bf31

Branch: refs/heads/v1-8-stable
Commit: 3658bf310811cd22651b6c20c5d50bfbd3153025
Parents: 563cc9a
Author: Jeremiah Lowin <jl...@apache.org>
Authored: Sun Feb 12 15:37:56 2017 -0500
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Sat Feb 18 15:26:34 2017 +0100

----------------------------------------------------------------------
 .../example_emr_job_flow_automatic_steps.py     |  6 +--
 .../example_emr_job_flow_manual_steps.py        |  5 ++-
 .../example_dags/example_qubole_operator.py     |  6 +--
 .../contrib/example_dags/example_twitter_dag.py |  5 ++-
 airflow/example_dags/example_bash_operator.py   |  9 ++--
 airflow/example_dags/example_branch_operator.py |  8 ++--
 .../example_branch_python_dop_operator_3.py     |  5 +--
 airflow/example_dags/example_http_operator.py   |  7 ++-
 airflow/example_dags/example_latest_only.py     |  4 +-
 .../example_latest_only_with_trigger.py         |  4 +-
 .../example_passing_params_via_test_command.py  |  6 +--
 airflow/example_dags/example_python_operator.py |  7 +--
 .../example_short_circuit_operator.py           |  7 ++-
 airflow/example_dags/example_skip_dag.py        |  9 ++--
 airflow/example_dags/example_subdag_operator.py |  4 +-
 airflow/example_dags/example_xcom.py            |  9 ++--
 airflow/example_dags/test_utils.py              |  3 +-
 airflow/example_dags/tutorial.py                |  7 ++-
 airflow/utils/dates.py                          | 13 ++++++
 dags/test_dag.py                                |  2 +-
 scripts/perf/dags/perf_dag_1.py                 |  7 ++-
 scripts/perf/dags/perf_dag_2.py                 |  8 ++--
 tests/utils/__init__.py                         | 16 +++++++
 tests/utils/dates.py                            | 45 ++++++++++++++++++++
 24 files changed, 132 insertions(+), 70 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/contrib/example_dags/example_emr_job_flow_automatic_steps.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/example_dags/example_emr_job_flow_automatic_steps.py b/airflow/contrib/example_dags/example_emr_job_flow_automatic_steps.py
index 18399c7..7f57ad1 100644
--- a/airflow/contrib/example_dags/example_emr_job_flow_automatic_steps.py
+++ b/airflow/contrib/example_dags/example_emr_job_flow_automatic_steps.py
@@ -12,8 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-from datetime import timedelta, datetime
-
+from datetime import timedelta
+import airflow
 from airflow import DAG
 from airflow.contrib.operators.emr_create_job_flow_operator import EmrCreateJobFlowOperator
 from airflow.contrib.sensors.emr_job_flow_sensor import EmrJobFlowSensor
@@ -21,7 +21,7 @@ from airflow.contrib.sensors.emr_job_flow_sensor import EmrJobFlowSensor
 DEFAULT_ARGS = {
     'owner': 'airflow',
     'depends_on_past': False,
-    'start_date': datetime(2016, 3, 13),
+    'start_date': airflow.utils.dates.days_ago(2),
     'email': ['airflow@airflow.com'],
     'email_on_failure': False,
     'email_on_retry': False

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/contrib/example_dags/example_emr_job_flow_manual_steps.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/example_dags/example_emr_job_flow_manual_steps.py b/airflow/contrib/example_dags/example_emr_job_flow_manual_steps.py
index b498d50..caa6943 100644
--- a/airflow/contrib/example_dags/example_emr_job_flow_manual_steps.py
+++ b/airflow/contrib/example_dags/example_emr_job_flow_manual_steps.py
@@ -12,8 +12,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-from datetime import timedelta, datetime
+from datetime import timedelta
 
+import airflow
 from airflow import DAG
 from airflow.contrib.operators.emr_create_job_flow_operator import EmrCreateJobFlowOperator
 from airflow.contrib.operators.emr_add_steps_operator import EmrAddStepsOperator
@@ -23,7 +24,7 @@ from airflow.contrib.operators.emr_terminate_job_flow_operator import EmrTermina
 DEFAULT_ARGS = {
     'owner': 'airflow',
     'depends_on_past': False,
-    'start_date': datetime(2016, 3, 13),
+    'start_date': airflow.utils.dates.days_ago(2),
     'email': ['airflow@airflow.com'],
     'email_on_failure': False,
     'email_on_retry': False

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/contrib/example_dags/example_qubole_operator.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/example_dags/example_qubole_operator.py b/airflow/contrib/example_dags/example_qubole_operator.py
index b482cf4..fce0175 100644
--- a/airflow/contrib/example_dags/example_qubole_operator.py
+++ b/airflow/contrib/example_dags/example_qubole_operator.py
@@ -16,17 +16,15 @@ from airflow import DAG
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.operators.python_operator import PythonOperator, BranchPythonOperator
 from airflow.contrib.operators.qubole_operator import QuboleOperator
-from datetime import datetime, timedelta
 import filecmp
 import random
 
-seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
-                                  datetime.min.time())
+
 
 default_args = {
     'owner': 'airflow',
     'depends_on_past': False,
-    'start_date': seven_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2)
     'email': ['airflow@airflow.com'],
     'email_on_failure': False,
     'email_on_retry': False

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/contrib/example_dags/example_twitter_dag.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/example_dags/example_twitter_dag.py b/airflow/contrib/example_dags/example_twitter_dag.py
index d63b4e8..a25c8d0 100644
--- a/airflow/contrib/example_dags/example_twitter_dag.py
+++ b/airflow/contrib/example_dags/example_twitter_dag.py
@@ -22,11 +22,12 @@
 # Load The Dependencies
 # --------------------------------------------------------------------------------
 
+import airflow
 from airflow import DAG
 from airflow.operators.bash_operator import BashOperator
 from airflow.operators.python_operator import PythonOperator
 from airflow.operators.hive_operator import HiveOperator
-from datetime import datetime, date, timedelta
+from datetime import date, timedelta
 
 # --------------------------------------------------------------------------------
 # Create a few placeholder scripts. In practice these would be different python
@@ -57,7 +58,7 @@ def transfertodb():
 default_args = {
     'owner': 'Ekhtiar',
     'depends_on_past': False,
-    'start_date': datetime(2016, 3, 13),
+    'start_date': airflow.utils.dates.days_ago(5),
     'email': ['airflow@airflow.com'],
     'email_on_failure': False,
     'email_on_retry': False,

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_bash_operator.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_bash_operator.py b/airflow/example_dags/example_bash_operator.py
index 0d18bcf..6887fa9 100644
--- a/airflow/example_dags/example_bash_operator.py
+++ b/airflow/example_dags/example_bash_operator.py
@@ -11,17 +11,18 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+
+import airflow
 from builtins import range
 from airflow.operators.bash_operator import BashOperator
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.models import DAG
-from datetime import datetime, timedelta
+from datetime import timedelta
+
 
-seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
-                                  datetime.min.time())
 args = {
     'owner': 'airflow',
-    'start_date': seven_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2)
 }
 
 dag = DAG(

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_branch_operator.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_branch_operator.py b/airflow/example_dags/example_branch_operator.py
index cc559d0..2b11d91 100644
--- a/airflow/example_dags/example_branch_operator.py
+++ b/airflow/example_dags/example_branch_operator.py
@@ -11,17 +11,17 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+
+import airflow
 from airflow.operators.python_operator import BranchPythonOperator
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.models import DAG
-from datetime import datetime, timedelta
 import random
 
-seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
-                                  datetime.min.time())
+
 args = {
     'owner': 'airflow',
-    'start_date': seven_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2)
 }
 
 dag = DAG(

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_branch_python_dop_operator_3.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_branch_python_dop_operator_3.py b/airflow/example_dags/example_branch_python_dop_operator_3.py
index 1dd190e..6da7b68 100644
--- a/airflow/example_dags/example_branch_python_dop_operator_3.py
+++ b/airflow/example_dags/example_branch_python_dop_operator_3.py
@@ -13,16 +13,15 @@
 # limitations under the License.
 #
 
+import airflow
 from airflow.operators.python_operator import BranchPythonOperator
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.models import DAG
 from datetime import datetime, timedelta
 
-two_days_ago = datetime.combine(datetime.today() - timedelta(2),
-                                  datetime.min.time())
 args = {
     'owner': 'airflow',
-    'start_date': two_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2),
     'depends_on_past': True,
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_http_operator.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_http_operator.py b/airflow/example_dags/example_http_operator.py
index 18a67f5..0cc23b9 100644
--- a/airflow/example_dags/example_http_operator.py
+++ b/airflow/example_dags/example_http_operator.py
@@ -14,19 +14,18 @@
 """
 ### Example HTTP operator and sensor
 """
+import airflow
 from airflow import DAG
 from airflow.operators.http_operator import SimpleHttpOperator
 from airflow.operators.sensors import HttpSensor
-from datetime import datetime, timedelta
+from datetime import timedelta
 import json
 
-seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
-                                  datetime.min.time())
 
 default_args = {
     'owner': 'airflow',
     'depends_on_past': False,
-    'start_date': seven_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2),
     'email': ['airflow@airflow.com'],
     'email_on_failure': False,
     'email_on_retry': False,

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_latest_only.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_latest_only.py b/airflow/example_dags/example_latest_only.py
index 9ce03b9..38ee900 100644
--- a/airflow/example_dags/example_latest_only.py
+++ b/airflow/example_dags/example_latest_only.py
@@ -16,16 +16,16 @@ Example of the LatestOnlyOperator
 """
 import datetime as dt
 
+import airflow
 from airflow.models import DAG
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.operators.latest_only_operator import LatestOnlyOperator
 from airflow.utils.trigger_rule import TriggerRule
 
-
 dag = DAG(
     dag_id='latest_only',
     schedule_interval=dt.timedelta(hours=4),
-    start_date=dt.datetime(2016, 9, 20),
+    start_date=airflow.utils.dates.days_ago(2),
 )
 
 latest_only = LatestOnlyOperator(task_id='latest_only', dag=dag)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_latest_only_with_trigger.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_latest_only_with_trigger.py b/airflow/example_dags/example_latest_only_with_trigger.py
index e3a88b7..f2afdcf 100644
--- a/airflow/example_dags/example_latest_only_with_trigger.py
+++ b/airflow/example_dags/example_latest_only_with_trigger.py
@@ -16,16 +16,16 @@ Example LatestOnlyOperator and TriggerRule interactions
 """
 import datetime as dt
 
+import airflow
 from airflow.models import DAG
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.operators.latest_only_operator import LatestOnlyOperator
 from airflow.utils.trigger_rule import TriggerRule
 
-
 dag = DAG(
     dag_id='latest_only_with_trigger',
     schedule_interval=dt.timedelta(hours=4),
-    start_date=dt.datetime(2016, 9, 20),
+    start_date=airflow.utils.dates.days_ago(2),
 )
 
 latest_only = LatestOnlyOperator(task_id='latest_only', dag=dag)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_passing_params_via_test_command.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_passing_params_via_test_command.py b/airflow/example_dags/example_passing_params_via_test_command.py
index e337f3b..448effb 100644
--- a/airflow/example_dags/example_passing_params_via_test_command.py
+++ b/airflow/example_dags/example_passing_params_via_test_command.py
@@ -13,15 +13,15 @@
 # limitations under the License.
 #
 
-from datetime import datetime, timedelta
-
+from datetime import timedelta
+import airflow
 from airflow import DAG
 from airflow.operators.bash_operator import BashOperator
 from airflow.operators.python_operator import PythonOperator
 
 dag = DAG("example_passing_params_via_test_command",
           default_args={"owner": "airflow",
-                        "start_date":datetime.now()},
+                        "start_date": airflow.utils.dates.days_ago(1)},
           schedule_interval='*/1 * * * *',
           dagrun_timeout=timedelta(minutes=4)
           )

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_python_operator.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_python_operator.py b/airflow/example_dags/example_python_operator.py
index c5d7193..8108e1e 100644
--- a/airflow/example_dags/example_python_operator.py
+++ b/airflow/example_dags/example_python_operator.py
@@ -13,19 +13,16 @@
 # limitations under the License.
 from __future__ import print_function
 from builtins import range
+import airflow
 from airflow.operators.python_operator import PythonOperator
 from airflow.models import DAG
-from datetime import datetime, timedelta
 
 import time
 from pprint import pprint
 
-seven_days_ago = datetime.combine(
-        datetime.today() - timedelta(7), datetime.min.time())
-
 args = {
     'owner': 'airflow',
-    'start_date': seven_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2)
 }
 
 dag = DAG(

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_short_circuit_operator.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_short_circuit_operator.py b/airflow/example_dags/example_short_circuit_operator.py
index 92efe99..c9812ac 100644
--- a/airflow/example_dags/example_short_circuit_operator.py
+++ b/airflow/example_dags/example_short_circuit_operator.py
@@ -11,17 +11,16 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import airflow
 from airflow.operators.python_operator import ShortCircuitOperator
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.models import DAG
 import airflow.utils.helpers
-from datetime import datetime, timedelta
 
-seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
-                                  datetime.min.time())
+
 args = {
     'owner': 'airflow',
-    'start_date': seven_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2)
 }
 
 dag = DAG(dag_id='example_short_circuit_operator', default_args=args)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_skip_dag.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_skip_dag.py b/airflow/example_dags/example_skip_dag.py
index a38b126..b936020 100644
--- a/airflow/example_dags/example_skip_dag.py
+++ b/airflow/example_dags/example_skip_dag.py
@@ -12,16 +12,15 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+import airflow
 from airflow.operators.dummy_operator import DummyOperator
 from airflow.models import DAG
-from datetime import datetime, timedelta
 from airflow.exceptions import AirflowSkipException
 
-seven_days_ago = datetime.combine(datetime.today() - timedelta(1),
-                                  datetime.min.time())
+
 args = {
     'owner': 'airflow',
-    'start_date': seven_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2)
 }
 
 
@@ -53,5 +52,3 @@ def create_test_pipeline(suffix, trigger_rule, dag):
 
 create_test_pipeline('1', 'all_success', dag)
 create_test_pipeline('2', 'one_success', dag)
-
-

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_subdag_operator.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_subdag_operator.py b/airflow/example_dags/example_subdag_operator.py
index b872f43..0c11787 100644
--- a/airflow/example_dags/example_subdag_operator.py
+++ b/airflow/example_dags/example_subdag_operator.py
@@ -11,7 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-from datetime import datetime
+import airflow
 
 from airflow.models import DAG
 from airflow.operators.dummy_operator import DummyOperator
@@ -24,7 +24,7 @@ DAG_NAME = 'example_subdag_operator'
 
 args = {
     'owner': 'airflow',
-    'start_date': datetime(2016, 1, 1),
+    'start_date': airflow.utils.dates.days_ago(2),
 }
 
 dag = DAG(

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/example_xcom.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/example_xcom.py b/airflow/example_dags/example_xcom.py
index 50728c3..b41421b 100644
--- a/airflow/example_dags/example_xcom.py
+++ b/airflow/example_dags/example_xcom.py
@@ -12,22 +12,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from __future__ import print_function
+import airflow
 from airflow import DAG
 from airflow.operators.python_operator import PythonOperator
-from datetime import datetime, timedelta
 
-seven_days_ago = datetime.combine(
-    datetime.today() - timedelta(7),
-    datetime.min.time())
 args = {
     'owner': 'airflow',
-    'start_date': seven_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2),
     'provide_context': True
 }
 
 dag = DAG(
     'example_xcom',
-    start_date=datetime(2015, 1, 1),
     schedule_interval="@once",
     default_args=args)
 
@@ -60,6 +56,7 @@ def puller(**kwargs):
     v1, v2 = ti.xcom_pull(key=None, task_ids=['push', 'push_by_returning'])
     assert (v1, v2) == (value_1, value_2)
 
+
 push1 = PythonOperator(
     task_id='push', dag=dag, python_callable=push)
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/test_utils.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/test_utils.py b/airflow/example_dags/test_utils.py
index 70391c3..0ed9bdb 100644
--- a/airflow/example_dags/test_utils.py
+++ b/airflow/example_dags/test_utils.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Used for unit tests"""
+import airflow
 from airflow.operators.bash_operator import BashOperator
 from airflow.models import DAG
 from datetime import datetime
@@ -25,5 +26,5 @@ task = BashOperator(
     task_id='sleeps_forever',
     dag=dag,
     bash_command="sleep 10000000000",
-    start_date=datetime(2016, 1, 1),
+    start_date=airflow.utils.dates.days_ago(2),
     owner='airflow')

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/example_dags/tutorial.py
----------------------------------------------------------------------
diff --git a/airflow/example_dags/tutorial.py b/airflow/example_dags/tutorial.py
index c7b2e0f..6ede09a 100644
--- a/airflow/example_dags/tutorial.py
+++ b/airflow/example_dags/tutorial.py
@@ -17,19 +17,18 @@
 Documentation that goes along with the Airflow tutorial located
 [here](http://pythonhosted.org/airflow/tutorial.html)
 """
+import airflow
 from airflow import DAG
 from airflow.operators.bash_operator import BashOperator
-from datetime import datetime, timedelta
+from datetime import timedelta
 
-seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
-                                  datetime.min.time())
 
 # these args will get passed on to each operator
 # you can override them on a per-task basis during operator initialization
 default_args = {
     'owner': 'airflow',
     'depends_on_past': False,
-    'start_date': seven_days_ago,
+    'start_date': airflow.utils.dates.days_ago(2),
     'email': ['airflow@airflow.com'],
     'email_on_failure': False,
     'email_on_retry': False,

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/airflow/utils/dates.py
----------------------------------------------------------------------
diff --git a/airflow/utils/dates.py b/airflow/utils/dates.py
index 84fd791..f89b20c 100644
--- a/airflow/utils/dates.py
+++ b/airflow/utils/dates.py
@@ -212,3 +212,16 @@ def scale_time_units(time_seconds_arr, unit):
     elif unit == 'days':
         return list(map(lambda x: x*1.0/(24*60*60), time_seconds_arr))
     return time_seconds_arr
+
+
+def days_ago(n, hour=0, minute=0, second=0, microsecond=0):
+    """
+    Get a datetime object representing `n` days ago. By default the time is
+    set to midnight.
+    """
+    today = datetime.today().replace(
+        hour=hour,
+        minute=minute,
+        second=second,
+        microsecond=microsecond)
+    return today - timedelta(days=n)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/dags/test_dag.py
----------------------------------------------------------------------
diff --git a/dags/test_dag.py b/dags/test_dag.py
index a1cbb74..db0b648 100644
--- a/dags/test_dag.py
+++ b/dags/test_dag.py
@@ -24,7 +24,7 @@ DAG_NAME = 'test_dag_v1'
 default_args = {
     'owner': 'airflow',
     'depends_on_past': True,
-    'start_date': START_DATE,
+    'start_date': airflow.utils.dates.days_ago(2)
 }
 dag = DAG(DAG_NAME, schedule_interval='*/10 * * * *', default_args=default_args)
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/scripts/perf/dags/perf_dag_1.py
----------------------------------------------------------------------
diff --git a/scripts/perf/dags/perf_dag_1.py b/scripts/perf/dags/perf_dag_1.py
index d97c830..fe71303 100644
--- a/scripts/perf/dags/perf_dag_1.py
+++ b/scripts/perf/dags/perf_dag_1.py
@@ -11,15 +11,14 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import airflow
 from airflow.operators.bash_operator import BashOperator
 from airflow.models import DAG
-from datetime import datetime, timedelta
+from datetime import timedelta
 
-five_days_ago = datetime.combine(datetime.today() - timedelta(5),
-                                 datetime.min.time())
 args = {
     'owner': 'airflow',
-    'start_date': five_days_ago,
+    'start_date': airflow.utils.dates.days_ago(3),
 }
 
 dag = DAG(

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/scripts/perf/dags/perf_dag_2.py
----------------------------------------------------------------------
diff --git a/scripts/perf/dags/perf_dag_2.py b/scripts/perf/dags/perf_dag_2.py
index cccd547..16948d4 100644
--- a/scripts/perf/dags/perf_dag_2.py
+++ b/scripts/perf/dags/perf_dag_2.py
@@ -11,15 +11,15 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+
+import airflow
 from airflow.operators.bash_operator import BashOperator
 from airflow.models import DAG
-from datetime import datetime, timedelta
+from datetime import timedelta
 
-five_days_ago = datetime.combine(datetime.today() - timedelta(5),
-                                 datetime.min.time())
 args = {
     'owner': 'airflow',
-    'start_date': five_days_ago,
+    'start_date': airflow.utils.dates.days_ago(3),
 }
 
 dag = DAG(

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/tests/utils/__init__.py
----------------------------------------------------------------------
diff --git a/tests/utils/__init__.py b/tests/utils/__init__.py
new file mode 100644
index 0000000..6b15998
--- /dev/null
+++ b/tests/utils/__init__.py
@@ -0,0 +1,16 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .compression import *
+from .dates import *

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3658bf31/tests/utils/dates.py
----------------------------------------------------------------------
diff --git a/tests/utils/dates.py b/tests/utils/dates.py
new file mode 100644
index 0000000..dc0c87e
--- /dev/null
+++ b/tests/utils/dates.py
@@ -0,0 +1,45 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from datetime import datetime, timedelta
+import unittest
+
+from airflow.utils import dates
+
+class Dates(unittest.TestCase):
+
+    def test_days_ago(self):
+        today = datetime.today()
+        today_midnight = datetime.fromordinal(today.date().toordinal())
+
+        self.assertTrue(dates.days_ago(0) == today_midnight)
+
+        self.assertTrue(
+            dates.days_ago(100) == today_midnight + timedelta(days=-100))
+
+        self.assertTrue(
+            dates.days_ago(0, hour=3) == today_midnight + timedelta(hours=3))
+        self.assertTrue(
+            dates.days_ago(0, minute=3)
+            == today_midnight + timedelta(minutes=3))
+        self.assertTrue(
+            dates.days_ago(0, second=3)
+            == today_midnight + timedelta(seconds=3))
+        self.assertTrue(
+            dates.days_ago(0, microsecond=3)
+            == today_midnight + timedelta(microseconds=3))
+
+
+if __name__ == '__main__':
+    unittest.main()


[08/45] incubator-airflow git commit: Add known issue of 'num_runs'

Posted by bo...@apache.org.
Add known issue of 'num_runs'


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8aacc283
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8aacc283
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8aacc283

Branch: refs/heads/v1-8-stable
Commit: 8aacc283a6b3a605648bf4bd1361225a2a3678d9
Parents: 7925bed
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Fri Feb 10 14:53:02 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Fri Feb 10 14:55:49 2017 +0100

----------------------------------------------------------------------
 UPDATING.md | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8aacc283/UPDATING.md
----------------------------------------------------------------------
diff --git a/UPDATING.md b/UPDATING.md
index b56aca8..b0ab212 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -43,7 +43,7 @@ loops. This is now time bound and defaults to `-1`, which means run continuously
 #### num_runs
 Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies 
 the number of times to try to schedule each DAG file within `run_duration` time. Defaults to `-1`, which means try
-indefinitely.
+indefinitely. This is only available on the command line.
 
 #### min_file_process_interval
 After how much time should an updated DAG be picked up from the filesystem.
@@ -107,6 +107,21 @@ supported and will be removed entirely in Airflow 2.0
   Previously, `Operator.__init__()` accepted any arguments (either positional `*args` or keyword `**kwargs`) without 
   complaint. Now, invalid arguments will be rejected. (https://github.com/apache/incubator-airflow/pull/1285)
 
+### Known Issues
+There is a report that the default of "-1" for num_runs creates an issue where errors are reported while parsing tasks.
+It was not confirmed, but a workaround was found by changing the default back to `None`.
+
+To do this edit `cli.py`, find the following:
+
+```
+        'num_runs': Arg(
+            ("-n", "--num_runs"),
+            default=-1, type=int,
+            help="Set the number of runs to execute before exiting"),
+```
+
+and change `default=-1` to `default=None`. Please report on the mailing list if you have this issue.
+
 ## Airflow 1.7.1.2
 
 ### Changes to Configuration


[14/45] incubator-airflow git commit: [AIRFLOW-895] Address Apache release incompliancies

Posted by bo...@apache.org.
[AIRFLOW-895] Address Apache release incompliancies

* Fixes missing licenses in NOTICE
* Corrects license header
* Removes HighCharts left overs.

Closes #2098 from bolkedebruin/AIRFLOW-895

(cherry picked from commit 784b3638c5633a9a94e020c47a3b95b942e6fb87)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8ad9ab67
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8ad9ab67
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8ad9ab67

Branch: refs/heads/v1-8-stable
Commit: 8ad9ab673350207479e9597a36aadb1ec9987640
Parents: b38df6b
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Thu Feb 23 23:48:03 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Thu Feb 23 23:48:19 2017 +0100

----------------------------------------------------------------------
 MANIFEST.in                          |   9 +-
 NOTICE                               |   6 +-
 airflow/www/static/d3.tip.v0.6.3.js  |  35 +++---
 airflow/www/static/dagre-d3.js       |   3 +-
 airflow/www/static/heatmap-canvas.js | 194 ------------------------------
 airflow/www/static/heatmap.js        |  23 ----
 airflow/www/static/nvd3.tar.gz       | Bin 328377 -> 0 bytes
 setup.py                             |   8 +-
 8 files changed, 37 insertions(+), 241 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8ad9ab67/MANIFEST.in
----------------------------------------------------------------------
diff --git a/MANIFEST.in b/MANIFEST.in
index 0aea6b5..717b077 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -11,7 +11,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+include NOTICE
+include LICENSE
+include DISCLAIMER
+include CHANGELOG.txt
+include README.md
 graft airflow/www/templates
 graft airflow/www/static
 include airflow/alembic.ini
-include requirements.txt
+graft scripts/systemd
+graft scripts/upstart
+

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8ad9ab67/NOTICE
----------------------------------------------------------------------
diff --git a/NOTICE b/NOTICE
index 79f43d9..3bda66e 100644
--- a/NOTICE
+++ b/NOTICE
@@ -10,9 +10,11 @@ This product includes WebGL-2D.js (https://github.com/gameclosure/webgl-2d), Cop
 This product includes Bootstrap (http://getbootstrap.com - MIT license), Copyright (c) 2011-2016 Twitter, Inc.
 This product includes Bootstrap Toggle (http://www.bootstraptoggle.com - MIT license), Copyright 2014 Min Hur, The New York Times Company.
 This product includes Clock plugin (https://github.com/Lwangaman/jQuery-Clock-Plugin - Dual licensed under the MIT and GPL licenses), Copyright (c) 2010 John R D'Orazio (donjohn.fmmi@gmail.com)
-This product includes DataTables (datatables.net - datatables.net/license), Copyright � 2008-2015 SpryMedia Ltd.
+This product includes DataTables (datatables.net - MIT License), Copyright � 2008-2015 SpryMedia Ltd.
 This product includes Underscore.js (http://underscorejs.org - MIT license), Copyright (c) 2011-2013 Jeremy Ashkenas, DocumentCloud and Investigative Reporters & Editors.
 This product includes FooTable (http://fooplugins.com/plugins/footable-jquery/ - MIT license), Copyright 2013 Steven Usher & Brad Vincent.
 This product includes dagre (https://github.com/cpettitt/dagre - MIT license), Copyright (c) 2012-2014 Chris Pettitt.
-This product includes d3js (https://d3js.org/ - https://github.com/mbostock/d3/blob/master/LICENSE), Copyright (c) 2010-2016, Michael Bostock.
+This product includes d3js (https://d3js.org/ - BSD License), Copyright (c) 2010-2016, Michael Bostock.
 This product includes flask-kerberos (https://github.com/mkomitee/flask-kerberos - BSD License), Copyright (c) 2013, Michael Komitee
+This product includes ace (https://github.com/ajaxorg/ace - BSD License), Copyright (c) 2010, Ajax.org B.V.
+This product includes d3 tip (https://github.com/Caged/d3-tip - MIT License), Copyright (c) 2013-2017 Justin Palmer

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8ad9ab67/airflow/www/static/d3.tip.v0.6.3.js
----------------------------------------------------------------------
diff --git a/airflow/www/static/d3.tip.v0.6.3.js b/airflow/www/static/d3.tip.v0.6.3.js
index 9413a78..b2b48fb 100644
--- a/airflow/www/static/d3.tip.v0.6.3.js
+++ b/airflow/www/static/d3.tip.v0.6.3.js
@@ -1,20 +1,23 @@
 /**
-* Licensed to the Apache Software Foundation (ASF) under one
-* or more contributor license agreements. See the NOTICE file
-* distributed with this work for additional information
-* regarding copyright ownership. The ASF licenses this file
-* to you under the Apache License, Version 2.0 (the
-* "License"); you may not use this file except in compliance
-* with the License. You may obtain a copy of the License at
-*
-* http://www.apache.org/licenses/LICENSE-2.0
-*
-* Unless required by applicable law or agreed to in writing,
-* software distributed under the License is distributed on an
-* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-* KIND, either express or implied. See the License for the
-* specific language governing permissions and limitations
-* under the License.
+ * The MIT License (MIT)
+ * Copyright (c) 2013 Justin Palmer
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy of
+ * this software and associated documentation files (the "Software"), to deal in
+ * the Software without restriction, including without limitation the rights to
+ * use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+ * of the Software, and to permit persons to whom the Software is furnished to do
+ * so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
+ * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
+ * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+ * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */
 
 // d3.tip

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8ad9ab67/airflow/www/static/dagre-d3.js
----------------------------------------------------------------------
diff --git a/airflow/www/static/dagre-d3.js b/airflow/www/static/dagre-d3.js
index ec26bfe..2da7cdd 100644
--- a/airflow/www/static/dagre-d3.js
+++ b/airflow/www/static/dagre-d3.js
@@ -1,5 +1,6 @@
 ;(function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="function"&&require;if(!u&&a)return a(o,!0);if(i)return i(o,!0);throw new Error("Cannot find module '"+o+"'")}var f=n[o]={exports:{}};t[o][0].call(f.exports,function(e){var n=t[o][1][e];return s(n?n:e)},f,f.exports,e,t,n,r)}return n[o].exports}var i=typeof require=="function"&&require;for(var o=0;o<r.length;o++)s(r[o]);return s})({1:[function(require,module,exports){
-var global=self;/**
+var global=self;
+/**
  * @license
  * Copyright (c) 2012-2013 Chris Pettitt
  *

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8ad9ab67/airflow/www/static/heatmap-canvas.js
----------------------------------------------------------------------
diff --git a/airflow/www/static/heatmap-canvas.js b/airflow/www/static/heatmap-canvas.js
deleted file mode 100644
index 01ee471..0000000
--- a/airflow/www/static/heatmap-canvas.js
+++ /dev/null
@@ -1,194 +0,0 @@
-/**
-* Licensed to the Apache Software Foundation (ASF) under one
-* or more contributor license agreements. See the NOTICE file
-* distributed with this work for additional information
-* regarding copyright ownership. The ASF licenses this file
-* to you under the Apache License, Version 2.0 (the
-* "License"); you may not use this file except in compliance
-* with the License. You may obtain a copy of the License at
-*
-* http://www.apache.org/licenses/LICENSE-2.0
-*
-* Unless required by applicable law or agreed to in writing,
-* software distributed under the License is distributed on an
-* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-* KIND, either express or implied. See the License for the
-* specific language governing permissions and limitations
-* under the License.
-*/
-
-
-    /**
-     * This plugin extends Highcharts in two ways:
-     * - Use HTML5 canvas instead of SVG for rendering of the heatmap squares. Canvas
-     *   outperforms SVG when it comes to thousands of single shapes.
-     * - Add a K-D-tree to find the nearest point on mouse move. Since we no longer have SVG shapes
-     *   to capture mouseovers, we need another way of detecting hover points for the tooltip.
-     */
-    (function (H) {
-        var wrap = H.wrap,
-            seriesTypes = H.seriesTypes;
-
-        /**
-         * Recursively builds a K-D-tree
-         */
-        function KDTree(points, depth) {
-            var axis, median, length = points && points.length;
-
-            if (length) {
-
-                // alternate between the axis
-                axis = ['plotX', 'plotY'][depth % 2];
-
-                // sort point array
-                points.sort(function (a, b) {
-                    return a[axis] - b[axis];
-                });
-
-                median = Math.floor(length / 2);
-
-                // build and return node
-                return {
-                    point: points[median],
-                    left: KDTree(points.slice(0, median), depth + 1),
-                    right: KDTree(points.slice(median + 1), depth + 1)
-                };
-
-            }
-        }
-
-        /**
-         * Recursively searches for the nearest neighbour using the given K-D-tree
-         */
-        function nearest(search, tree, depth) {
-            var point = tree.point,
-                axis = ['plotX', 'plotY'][depth % 2],
-                tdist,
-                sideA,
-                sideB,
-                ret = point,
-                nPoint1,
-                nPoint2;
-
-            // Get distance
-            point.dist = Math.pow(search.plotX - point.plotX, 2) +
-                Math.pow(search.plotY - point.plotY, 2);
-
-            // Pick side based on distance to splitting point
-            tdist = search[axis] - point[axis];
-            sideA = tdist < 0 ? 'left' : 'right';
-
-            // End of tree
-            if (tree[sideA]) {
-                nPoint1 = nearest(search, tree[sideA], depth + 1);
-
-                ret = (nPoint1.dist < ret.dist ? nPoint1 : point);
-
-                sideB = tdist < 0 ? 'right' : 'left';
-                if (tree[sideB]) {
-                    // compare distance to current best to splitting point to decide wether to check side B or not
-                    if (Math.abs(tdist) < ret.dist) {
-                        nPoint2 = nearest(search, tree[sideB], depth + 1);
-                        ret = (nPoint2.dist < ret.dist ? nPoint2 : ret);
-                    }
-                }
-            }
-            return ret;
-        }
-
-        // Extend the heatmap to use the K-D-tree to search for nearest points
-        H.seriesTypes.heatmap.prototype.setTooltipPoints = function () {
-            var series = this;
-
-            this.tree = null;
-            setTimeout(function () {
-                series.tree = KDTree(series.points, 0);
-            });
-        };
-        H.seriesTypes.heatmap.prototype.getNearest = function (search) {
-            if (this.tree) {
-                return nearest(search, this.tree, 0);
-            }
-        };
-
-        H.wrap(H.Pointer.prototype, 'runPointActions', function (proceed, e) {
-            var chart = this.chart;
-            proceed.call(this, e);
-
-            // Draw independent tooltips
-            H.each(chart.series, function (series) {
-                var point;
-                if (series.getNearest) {
-                    point = series.getNearest({
-                        plotX: e.chartX - chart.plotLeft,
-                        plotY: e.chartY - chart.plotTop
-                    });
-                    if (point) {
-                        point.onMouseOver(e);
-                    }
-                }
-            })
-        });
-
-        /**
-         * Get the canvas context for a series
-         */
-        H.Series.prototype.getContext = function () {
-            var canvas;
-            if (!this.ctx) {
-                canvas = document.createElement('canvas');
-                canvas.setAttribute('width', this.chart.plotWidth);
-                canvas.setAttribute('height', this.chart.plotHeight);
-                canvas.style.position = 'absolute';
-                canvas.style.left = this.group.translateX + 'px';
-                canvas.style.top = this.group.translateY + 'px';
-                canvas.style.zIndex = 0;
-                canvas.style.cursor = 'crosshair';
-                this.chart.container.appendChild(canvas);
-                if (canvas.getContext) {
-                    this.ctx = canvas.getContext('2d');
-                }
-            }
-            return this.ctx;
-        }
-
-        /**
-         * Wrap the drawPoints method to draw the points in canvas instead of the slower SVG,
-         * that requires one shape each point.
-         */
-        H.wrap(H.seriesTypes.heatmap.prototype, 'drawPoints', function (proceed) {
-
-            var ctx;
-            if (this.chart.renderer.forExport) {
-                // Run SVG shapes
-                proceed.call(this);
-
-            } else {
-
-                if (ctx = this.getContext()) {
-
-                    // draw the columns
-                    H.each(this.points, function (point) {
-                        var plotY = point.plotY,
-                            shapeArgs;
-
-                        if (plotY !== undefined && !isNaN(plotY) && point.y !== null) {
-                            shapeArgs = point.shapeArgs;
-
-                            ctx.fillStyle = point.pointAttr[''].fill;
-                            ctx.fillRect(shapeArgs.x, shapeArgs.y, shapeArgs.width, shapeArgs.height);
-                        }
-                    });
-
-                } else {
-                    this.chart.showLoading("Your browser doesn't support HTML5 canvas, <br>please use a modern browser");
-
-                    // Uncomment this to provide low-level (slow) support in oldIE. It will cause script errors on
-                    // charts with more than a few thousand points.
-                    //proceed.call(this);
-                }
-            }
-        });
-    }(Highcharts));
-
-

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8ad9ab67/airflow/www/static/heatmap.js
----------------------------------------------------------------------
diff --git a/airflow/www/static/heatmap.js b/airflow/www/static/heatmap.js
deleted file mode 100644
index fc9b856..0000000
--- a/airflow/www/static/heatmap.js
+++ /dev/null
@@ -1,23 +0,0 @@
-/*
- Highcharts JS v4.0.4 (2014-09-02)
-
- (c) 2011-2014 Torstein Honsi
-
- License: www.highcharts.com/license
-*/
-(function(h){var k=h.Axis,y=h.Chart,l=h.Color,z=h.Legend,t=h.LegendSymbolMixin,u=h.Series,v=h.SVGRenderer,w=h.getOptions(),i=h.each,r=h.extend,A=h.extendClass,m=h.merge,o=h.pick,x=h.numberFormat,p=h.seriesTypes,s=h.wrap,n=function(){},q=h.ColorAxis=function(){this.isColorAxis=!0;this.init.apply(this,arguments)};r(q.prototype,k.prototype);r(q.prototype,{defaultColorAxisOptions:{lineWidth:0,gridLineWidth:1,tickPixelInterval:72,startOnTick:!0,endOnTick:!0,offset:0,marker:{animation:{duration:50},color:"gray",
-width:0.01},labels:{overflow:"justify"},minColor:"#EFEFFF",maxColor:"#003875",tickLength:5},init:function(b,a){var c=b.options.legend.layout!=="vertical",d;d=m(this.defaultColorAxisOptions,{side:c?2:1,reversed:!c},a,{isX:c,opposite:!c,showEmpty:!1,title:null,isColor:!0});k.prototype.init.call(this,b,d);a.dataClasses&&this.initDataClasses(a);this.initStops(a);this.isXAxis=!0;this.horiz=c;this.zoomEnabled=!1},tweenColors:function(b,a,c){var d=a.rgba[3]!==1||b.rgba[3]!==1;return(d?"rgba(":"rgb(")+Math.round(a.rgba[0]+
-(b.rgba[0]-a.rgba[0])*(1-c))+","+Math.round(a.rgba[1]+(b.rgba[1]-a.rgba[1])*(1-c))+","+Math.round(a.rgba[2]+(b.rgba[2]-a.rgba[2])*(1-c))+(d?","+(a.rgba[3]+(b.rgba[3]-a.rgba[3])*(1-c)):"")+")"},initDataClasses:function(b){var a=this,c=this.chart,d,e=0,f=this.options,g=b.dataClasses.length;this.dataClasses=d=[];this.legendItems=[];i(b.dataClasses,function(b,h){var i,b=m(b);d.push(b);if(!b.color)f.dataClassColor==="category"?(i=c.options.colors,b.color=i[e++],e===i.length&&(e=0)):b.color=a.tweenColors(l(f.minColor),
-l(f.maxColor),g<2?0.5:h/(g-1))})},initStops:function(b){this.stops=b.stops||[[0,this.options.minColor],[1,this.options.maxColor]];i(this.stops,function(a){a.color=l(a[1])})},setOptions:function(b){k.prototype.setOptions.call(this,b);this.options.crosshair=this.options.marker;this.coll="colorAxis"},setAxisSize:function(){var b=this.legendSymbol,a=this.chart,c,d,e;if(b)this.left=c=b.attr("x"),this.top=d=b.attr("y"),this.width=e=b.attr("width"),this.height=b=b.attr("height"),this.right=a.chartWidth-
-c-e,this.bottom=a.chartHeight-d-b,this.len=this.horiz?e:b,this.pos=this.horiz?c:d},toColor:function(b,a){var c,d=this.stops,e,f=this.dataClasses,g,j;if(f)for(j=f.length;j--;){if(g=f[j],e=g.from,d=g.to,(e===void 0||b>=e)&&(d===void 0||b<=d)){c=g.color;if(a)a.dataClass=j;break}}else{this.isLog&&(b=this.val2lin(b));c=1-(this.max-b)/(this.max-this.min||1);for(j=d.length;j--;)if(c>d[j][0])break;e=d[j]||d[j+1];d=d[j+1]||e;c=1-(d[0]-c)/(d[0]-e[0]||1);c=this.tweenColors(e.color,d.color,c)}return c},getOffset:function(){var b=
-this.legendGroup,a=this.chart.axisOffset[this.side];if(b){k.prototype.getOffset.call(this);if(!this.axisGroup.parentGroup)this.axisGroup.add(b),this.gridGroup.add(b),this.labelGroup.add(b),this.added=!0;this.chart.axisOffset[this.side]=a}},setLegendColor:function(){var b,a=this.options;b=this.horiz?[0,0,1,0]:[0,0,0,1];this.legendColor={linearGradient:{x1:b[0],y1:b[1],x2:b[2],y2:b[3]},stops:a.stops||[[0,a.minColor],[1,a.maxColor]]}},drawLegendSymbol:function(b,a){var c=b.padding,d=b.options,e=this.horiz,
-f=o(d.symbolWidth,e?200:12),g=o(d.symbolHeight,e?12:200),j=o(d.labelPadding,e?16:30),d=o(d.itemDistance,10);this.setLegendColor();a.legendSymbol=this.chart.renderer.rect(0,b.baseline-11,f,g).attr({zIndex:1}).add(a.legendGroup);a.legendSymbol.getBBox();this.legendItemWidth=f+c+(e?d:j);this.legendItemHeight=g+c+(e?j:0)},setState:n,visible:!0,setVisible:n,getSeriesExtremes:function(){var b;if(this.series.length)b=this.series[0],this.dataMin=b.valueMin,this.dataMax=b.valueMax},drawCrosshair:function(b,
-a){var c=!this.cross,d=a&&a.plotX,e=a&&a.plotY,f,g=this.pos,j=this.len;if(a)f=this.toPixels(a.value),f<g?f=g-2:f>g+j&&(f=g+j+2),a.plotX=f,a.plotY=this.len-f,k.prototype.drawCrosshair.call(this,b,a),a.plotX=d,a.plotY=e,!c&&this.cross&&this.cross.attr({fill:this.crosshair.color}).add(this.labelGroup)},getPlotLinePath:function(b,a,c,d,e){return e?this.horiz?["M",e-4,this.top-6,"L",e+4,this.top-6,e,this.top,"Z"]:["M",this.left,e,"L",this.left-6,e+6,this.left-6,e-6,"Z"]:k.prototype.getPlotLinePath.call(this,
-b,a,c,d)},update:function(b,a){i(this.series,function(a){a.isDirtyData=!0});k.prototype.update.call(this,b,a);this.legendItem&&(this.setLegendColor(),this.chart.legend.colorizeItem(this,!0))},getDataClassLegendSymbols:function(){var b=this,a=this.chart,c=this.legendItems,d=a.options.legend,e=d.valueDecimals,f=d.valueSuffix||"",g;c.length||i(this.dataClasses,function(d,h){var k=!0,l=d.from,m=d.to;g="";l===void 0?g="< ":m===void 0&&(g="> ");l!==void 0&&(g+=x(l,e)+f);l!==void 0&&m!==void 0&&(g+=" - ");
-m!==void 0&&(g+=x(m,e)+f);c.push(r({chart:a,name:g,options:{},drawLegendSymbol:t.drawRectangle,visible:!0,setState:n,setVisible:function(){k=this.visible=!k;i(b.series,function(a){i(a.points,function(a){a.dataClass===h&&a.setVisible(k)})});a.legend.colorizeItem(this,k)}},d))});return c},name:""});i(["fill","stroke"],function(b){HighchartsAdapter.addAnimSetter(b,function(a){a.elem.attr(b,q.prototype.tweenColors(l(a.start),l(a.end),a.pos))})});s(y.prototype,"getAxes",function(b){var a=this.options.colorAxis;
-b.call(this);this.colorAxis=[];a&&new q(this,a)});s(z.prototype,"getAllItems",function(b){var a=[],c=this.chart.colorAxis[0];c&&(c.options.dataClasses?a=a.concat(c.getDataClassLegendSymbols()):a.push(c),i(c.series,function(a){a.options.showInLegend=!1}));return a.concat(b.call(this))});h={pointAttrToOptions:{stroke:"borderColor","stroke-width":"borderWidth",fill:"color",dashstyle:"dashStyle"},pointArrayMap:["value"],axisTypes:["xAxis","yAxis","colorAxis"],optionalAxis:"colorAxis",trackerGroups:["group",
-"markerGroup","dataLabelsGroup"],getSymbol:n,parallelArrays:["x","y","value"],colorKey:"value",translateColors:function(){var b=this,a=this.options.nullColor,c=this.colorAxis,d=this.colorKey;i(this.data,function(e){var f=e[d];if(f=f===null?a:c&&f!==void 0?c.toColor(f,e):e.color||b.color)e.color=f})}};s(v.prototype,"buildText",function(b,a){var c=a.styles&&a.styles.HcTextStroke;b.call(this,a);c&&a.applyTextStroke&&a.applyTextStroke(c)});v.prototype.Element.prototype.applyTextStroke=function(b){var a=
-this.element,c,d,b=b.split(" ");c=a.getElementsByTagName("tspan");d=a.firstChild;this.ySetter=this.xSetter;i([].slice.call(c),function(c,f){var g;f===0&&(c.setAttribute("x",a.getAttribute("x")),(f=a.getAttribute("y"))!==null&&c.setAttribute("y",f));g=c.cloneNode(1);g.setAttribute("stroke",b[1]);g.setAttribute("stroke-width",b[0]);g.setAttribute("stroke-linejoin","round");a.insertBefore(g,d)})};w.plotOptions.heatmap=m(w.plotOptions.scatter,{animation:!1,borderWidth:0,nullColor:"#F8F8F8",dataLabels:{formatter:function(){return this.point.value},
-verticalAlign:"middle",crop:!1,overflow:!1,style:{color:"white",fontWeight:"bold",HcTextStroke:"1px rgba(0,0,0,0.5)"}},marker:null,tooltip:{pointFormat:"{point.x}, {point.y}: {point.value}<br/>"},states:{normal:{animation:!0},hover:{brightness:0.2}}});p.heatmap=A(p.scatter,m(h,{type:"heatmap",pointArrayMap:["y","value"],hasPointSpecificOptions:!0,supportsDrilldown:!0,getExtremesFromAll:!0,init:function(){p.scatter.prototype.init.apply(this,arguments);this.pointRange=this.options.colsize||1;this.yAxis.axisPointRange=
-this.options.rowsize||1},translate:function(){var b=this.options,a=this.xAxis,c=this.yAxis;this.generatePoints();i(this.points,function(d){var e=(b.colsize||1)/2,f=(b.rowsize||1)/2,g=Math.round(a.len-a.translate(d.x-e,0,1,0,1)),e=Math.round(a.len-a.translate(d.x+e,0,1,0,1)),h=Math.round(c.translate(d.y-f,0,1,0,1)),f=Math.round(c.translate(d.y+f,0,1,0,1));d.plotX=(g+e)/2;d.plotY=(h+f)/2;d.shapeType="rect";d.shapeArgs={x:Math.min(g,e),y:Math.min(h,f),width:Math.abs(e-g),height:Math.abs(f-h)}});this.translateColors();
-this.chart.hasRendered&&i(this.points,function(a){a.shapeArgs.fill=a.options.color||a.color})},drawPoints:p.column.prototype.drawPoints,animate:n,getBox:n,drawLegendSymbol:t.drawRectangle,getExtremes:function(){u.prototype.getExtremes.call(this,this.valueData);this.valueMin=this.dataMin;this.valueMax=this.dataMax;u.prototype.getExtremes.call(this)}}))})(Highcharts);

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8ad9ab67/airflow/www/static/nvd3.tar.gz
----------------------------------------------------------------------
diff --git a/airflow/www/static/nvd3.tar.gz b/airflow/www/static/nvd3.tar.gz
deleted file mode 100644
index fd9504d..0000000
Binary files a/airflow/www/static/nvd3.tar.gz and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8ad9ab67/setup.py
----------------------------------------------------------------------
diff --git a/setup.py b/setup.py
index c644eed..43b97d3 100644
--- a/setup.py
+++ b/setup.py
@@ -274,11 +274,11 @@ def do_setup():
             'Programming Language :: Python :: 3.4',
             'Topic :: System :: Monitoring',
         ],
-        author='Maxime Beauchemin',
-        author_email='maximebeauchemin@gmail.com',
-        url='https://github.com/apache/incubator-airflow',
+        author='Apache Software Foundation',
+        author_email='dev@airflow.incubator.apache.org',
+        url='http://airflow.incubator.apache.org/',
         download_url=(
-            'https://github.com/apache/incubator-airflow/tarball/' + version),
+            'https://dist.apache.org/repos/dist/release/incubator/airflow/' + version),
         cmdclass={
             'test': Tox,
             'extra_clean': CleanCommand,


[21/45] incubator-airflow git commit: [AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity

Posted by bo...@apache.org.
[AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity

[AIRFLOW-829][AIRFLOW-88] Reduce verbosity of
Travis tests

Remove the -s flag for Travis unit tests to
suppress output
from successful tests.

[AIRFLOW-830] Reduce plugins manager verbosity

The plugin manager prints all status to INFO,
which is unnecessary and
overly verbose.

Closes #2049 from jlowin/reduce-logs


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3b1e81ac
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3b1e81ac
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3b1e81ac

Branch: refs/heads/v1-8-stable
Commit: 3b1e81ac9e8e97b6d2a4c3217df81db9ddbd0900
Parents: e1d0adb
Author: Jeremiah Lowin <jl...@apache.org>
Authored: Wed Feb 8 08:32:25 2017 -0500
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:03:25 2017 -0700

----------------------------------------------------------------------
 airflow/plugins_manager.py |  4 ++--
 run_unit_tests.sh          | 36 ++++++++++++++++++++++++------------
 2 files changed, 26 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3b1e81ac/airflow/plugins_manager.py
----------------------------------------------------------------------
diff --git a/airflow/plugins_manager.py b/airflow/plugins_manager.py
index e0af20c..83aae23 100644
--- a/airflow/plugins_manager.py
+++ b/airflow/plugins_manager.py
@@ -72,7 +72,7 @@ for root, dirs, files in os.walk(plugins_folder, followlinks=True):
             if file_ext != '.py':
                 continue
 
-            logging.info('Importing plugin module ' + filepath)
+            logging.debug('Importing plugin module ' + filepath)
             # normalize root path as namespace
             namespace = '_'.join([re.sub(norm_pattern, '__', root), mod_name])
 
@@ -92,7 +92,7 @@ for root, dirs, files in os.walk(plugins_folder, followlinks=True):
 
 
 def make_module(name, objects):
-    logging.info('Creating module ' + name)
+    logging.debug('Creating module ' + name)
     name = name.lower()
     module = imp.new_module(name)
     module._name = name.split('.')[-1]

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3b1e81ac/run_unit_tests.sh
----------------------------------------------------------------------
diff --git a/run_unit_tests.sh b/run_unit_tests.sh
index c922a55..b3ee074 100755
--- a/run_unit_tests.sh
+++ b/run_unit_tests.sh
@@ -28,17 +28,6 @@ export AIRFLOW_USE_NEW_IMPORTS=1
 # any argument received is overriding the default nose execution arguments:
 
 nose_args=$@
-if [ -z "$nose_args" ]; then
-  nose_args="--with-coverage \
---cover-erase \
---cover-html \
---cover-package=airflow \
---cover-html-dir=airflow/www/static/coverage \
---with-ignore-docstrings \
--s \
--v \
---logging-level=DEBUG "
-fi
 
 #--with-doctest
 
@@ -50,7 +39,18 @@ yes | airflow resetdb
 airflow initdb
 
 if [ "${TRAVIS}" ]; then
-  # For impersonation tests running on SQLite on Travis, make the database world readable so other 
+    if [ -z "$nose_args" ]; then
+      nose_args="--with-coverage \
+    --cover-erase \
+    --cover-html \
+    --cover-package=airflow \
+    --cover-html-dir=airflow/www/static/coverage \
+    --with-ignore-docstrings \
+    -v \
+    --logging-level=DEBUG "
+    fi
+
+  # For impersonation tests running on SQLite on Travis, make the database world readable so other
   # users can update it
   AIRFLOW_DB="/home/travis/airflow/airflow.db"
   if [ -f "${AIRFLOW_DB}" ]; then
@@ -60,6 +60,18 @@ if [ "${TRAVIS}" ]; then
   # For impersonation tests on Travis, make airflow accessible to other users via the global PATH
   # (which contains /usr/local/bin)
   sudo ln -s "${VIRTUAL_ENV}/bin/airflow" /usr/local/bin/
+else
+    if [ -z "$nose_args" ]; then
+      nose_args="--with-coverage \
+    --cover-erase \
+    --cover-html \
+    --cover-package=airflow \
+    --cover-html-dir=airflow/www/static/coverage \
+    --with-ignore-docstrings \
+    -s \
+    -v \
+    --logging-level=DEBUG "
+    fi
 fi
 
 echo "Starting the unit tests with the following nose arguments: "$nose_args


[22/45] incubator-airflow git commit: [AIRFLOW-856] Make sure execution date is set for local client

Posted by bo...@apache.org.
[AIRFLOW-856] Make sure execution date is set for local client

In the local api client the execution date was
hardi coded to None.
Secondly, when no execution date was specified the
execution date
was set to datetime.now(). Datetime.now() includes
the fractional seconds
that are supported in the database, but they are
not supported in
a.o. the current logging setup. Now we cut off
fractional seconds for
the execution date.

Closes #2064 from bolkedebruin/AIRFLOW-856


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3918e5e1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3918e5e1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3918e5e1

Branch: refs/heads/v1-8-stable
Commit: 3918e5e1c489bf01a6a836d1d76e2251137af5de
Parents: 3b1e81a
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Fri Feb 10 14:17:26 2017 +0100
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:09:43 2017 -0700

----------------------------------------------------------------------
 airflow/api/client/local_client.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3918e5e1/airflow/api/client/local_client.py
----------------------------------------------------------------------
diff --git a/airflow/api/client/local_client.py b/airflow/api/client/local_client.py
index 05f27f6..5422aa3 100644
--- a/airflow/api/client/local_client.py
+++ b/airflow/api/client/local_client.py
@@ -1,4 +1,4 @@
-# -*- coding: utf-8 -*-
+    # -*- coding: utf-8 -*-
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.


[06/45] incubator-airflow git commit: [AIRFLOW-856] Make sure execution date is set for local client

Posted by bo...@apache.org.
[AIRFLOW-856] Make sure execution date is set for local client

In the local api client the execution date was
hardi coded to None.
Secondly, when no execution date was specified the
execution date
was set to datetime.now(). Datetime.now() includes
the fractional seconds
that are supported in the database, but they are
not supported in
a.o. the current logging setup. Now we cut off
fractional seconds for
the execution date.

Closes #2064 from bolkedebruin/AIRFLOW-856

(cherry picked from commit b7c828bf094d3aa1eae310979a82addf7e423bb0)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/fb88c2d8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/fb88c2d8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/fb88c2d8

Branch: refs/heads/v1-8-stable
Commit: fb88c2d8362d751f902252c51c8bce4301ac8c40
Parents: adaebc2
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Fri Feb 10 14:17:26 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Fri Feb 10 14:17:44 2017 +0100

----------------------------------------------------------------------
 airflow/api/client/local_client.py             |   2 +-
 airflow/api/common/experimental/trigger_dag.py |   9 +-
 tests/__init__.py                              |   1 +
 tests/api/__init__.py                          |  17 ++++
 tests/api/client/__init__.py                   |  13 +++
 tests/api/client/local_client.py               | 107 ++++++++++++++++++++
 6 files changed, 144 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/fb88c2d8/airflow/api/client/local_client.py
----------------------------------------------------------------------
diff --git a/airflow/api/client/local_client.py b/airflow/api/client/local_client.py
index a4d1f93..05f27f6 100644
--- a/airflow/api/client/local_client.py
+++ b/airflow/api/client/local_client.py
@@ -21,5 +21,5 @@ class Client(api_client.Client):
         dr = trigger_dag.trigger_dag(dag_id=dag_id,
                                      run_id=run_id,
                                      conf=conf,
-                                     execution_date=None)
+                                     execution_date=execution_date)
         return "Created {}".format(dr)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/fb88c2d8/airflow/api/common/experimental/trigger_dag.py
----------------------------------------------------------------------
diff --git a/airflow/api/common/experimental/trigger_dag.py b/airflow/api/common/experimental/trigger_dag.py
index 0905017..2c5a462 100644
--- a/airflow/api/common/experimental/trigger_dag.py
+++ b/airflow/api/common/experimental/trigger_dag.py
@@ -12,15 +12,13 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-from datetime import datetime
+import datetime
 import json
 
 from airflow.exceptions import AirflowException
 from airflow.models import DagRun, DagBag
 from airflow.utils.state import State
 
-import logging
-
 
 def trigger_dag(dag_id, run_id=None, conf=None, execution_date=None):
     dagbag = DagBag()
@@ -31,7 +29,10 @@ def trigger_dag(dag_id, run_id=None, conf=None, execution_date=None):
     dag = dagbag.get_dag(dag_id)
 
     if not execution_date:
-        execution_date = datetime.now()
+        execution_date = datetime.datetime.now()
+
+    assert isinstance(execution_date, datetime.datetime)
+    execution_date = execution_date.replace(microsecond=0)
 
     if not run_id:
         run_id = "manual__{0}".format(execution_date.isoformat())

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/fb88c2d8/tests/__init__.py
----------------------------------------------------------------------
diff --git a/tests/__init__.py b/tests/__init__.py
index e1e8551..7ddf22d 100644
--- a/tests/__init__.py
+++ b/tests/__init__.py
@@ -14,6 +14,7 @@
 
 from __future__ import absolute_import
 
+from .api import *
 from .configuration import *
 from .contrib import *
 from .core import *

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/fb88c2d8/tests/api/__init__.py
----------------------------------------------------------------------
diff --git a/tests/api/__init__.py b/tests/api/__init__.py
new file mode 100644
index 0000000..2db97ad
--- /dev/null
+++ b/tests/api/__init__.py
@@ -0,0 +1,17 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from .client import *

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/fb88c2d8/tests/api/client/__init__.py
----------------------------------------------------------------------
diff --git a/tests/api/client/__init__.py b/tests/api/client/__init__.py
new file mode 100644
index 0000000..9d7677a
--- /dev/null
+++ b/tests/api/client/__init__.py
@@ -0,0 +1,13 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/fb88c2d8/tests/api/client/local_client.py
----------------------------------------------------------------------
diff --git a/tests/api/client/local_client.py b/tests/api/client/local_client.py
new file mode 100644
index 0000000..a36b71f
--- /dev/null
+++ b/tests/api/client/local_client.py
@@ -0,0 +1,107 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import unittest
+import datetime
+
+from mock import patch
+
+from airflow import AirflowException
+from airflow import models
+
+from airflow.api.client.local_client import Client
+from airflow.utils.state import State
+
+EXECDATE = datetime.datetime.now()
+EXECDATE_NOFRACTIONS = EXECDATE.replace(microsecond=0)
+EXECDATE_ISO = EXECDATE_NOFRACTIONS.isoformat()
+
+real_datetime_class = datetime.datetime
+
+
+def mock_datetime_now(target, dt):
+    class DatetimeSubclassMeta(type):
+        @classmethod
+        def __instancecheck__(mcs, obj):
+            return isinstance(obj, real_datetime_class)
+
+    class BaseMockedDatetime(real_datetime_class):
+        @classmethod
+        def now(cls, tz=None):
+            return target.replace(tzinfo=tz)
+
+        @classmethod
+        def utcnow(cls):
+            return target
+
+    # Python2 & Python3 compatible metaclass
+    MockedDatetime = DatetimeSubclassMeta('datetime', (BaseMockedDatetime,), {})
+
+    return patch.object(dt, 'datetime', MockedDatetime)
+
+
+class TestLocalClient(unittest.TestCase):
+    def setUp(self):
+        self.client = Client(api_base_url=None, auth=None)
+
+    @patch.object(models.DAG, 'create_dagrun')
+    def test_trigger_dag(self, mock):
+        client = self.client
+
+        # non existent
+        with self.assertRaises(AirflowException):
+            client.trigger_dag(dag_id="blablabla")
+
+        import airflow.api.common.experimental.trigger_dag
+        with mock_datetime_now(EXECDATE, airflow.api.common.experimental.trigger_dag.datetime):
+            # no execution date, execution date should be set automatically
+            client.trigger_dag(dag_id="test_start_date_scheduling")
+            mock.assert_called_once_with(run_id="manual__{0}".format(EXECDATE_ISO),
+                                         execution_date=EXECDATE_NOFRACTIONS,
+                                         state=State.RUNNING,
+                                         conf=None,
+                                         external_trigger=True)
+            mock.reset_mock()
+
+            # execution date with microseconds cutoff
+            client.trigger_dag(dag_id="test_start_date_scheduling", execution_date=EXECDATE)
+            mock.assert_called_once_with(run_id="manual__{0}".format(EXECDATE_ISO),
+                                         execution_date=EXECDATE_NOFRACTIONS,
+                                         state=State.RUNNING,
+                                         conf=None,
+                                         external_trigger=True)
+            mock.reset_mock()
+
+            # run id
+            run_id = "my_run_id"
+            client.trigger_dag(dag_id="test_start_date_scheduling", run_id=run_id)
+            mock.assert_called_once_with(run_id=run_id,
+                                         execution_date=EXECDATE_NOFRACTIONS,
+                                         state=State.RUNNING,
+                                         conf=None,
+                                         external_trigger=True)
+            mock.reset_mock()
+
+            # test conf
+            conf = '{"name": "John"}'
+            client.trigger_dag(dag_id="test_start_date_scheduling", conf=conf)
+            mock.assert_called_once_with(run_id="manual__{0}".format(EXECDATE_ISO),
+                                         execution_date=EXECDATE_NOFRACTIONS,
+                                         state=State.RUNNING,
+                                         conf=json.loads(conf),
+                                         external_trigger=True)
+            mock.reset_mock()
+
+            # this is a unit test only, cannot verify existing dag run


[30/45] incubator-airflow git commit: [AIRFLOW-937] Improve performance of task_stats

Posted by bo...@apache.org.
[AIRFLOW-937] Improve performance of task_stats

Please accept this PR that addresses the following
issues:
-
https://issues.apache.org/jira/browse/AIRFLOW-937

Testing Done:
- Shouldn't change functionality significantly,
should pass existing tests (if they exist)

This leads to slightly different results, but it
reduced the time of this endpoint from 90s to 9s
on our data, and the existing logic for task_ids
was already incorrect (task_ids may not be
distinct across dags)

Closes #2121 from saguziel/task-stats-fix


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/66f39ca0
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/66f39ca0
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/66f39ca0

Branch: refs/heads/v1-8-stable
Commit: 66f39ca0c3511da2ff86858ce7ea569d11adbd44
Parents: 0964f18
Author: Alex Guziel <al...@airbnb.com>
Authored: Thu Mar 2 14:04:49 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:21:13 2017 -0700

----------------------------------------------------------------------
 airflow/www/views.py | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/66f39ca0/airflow/www/views.py
----------------------------------------------------------------------
diff --git a/airflow/www/views.py b/airflow/www/views.py
index d8acfef..d1a1f9a 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -497,26 +497,24 @@ class Airflow(BaseView):
 
     @expose('/task_stats')
     def task_stats(self):
-        task_ids = []
-        dag_ids = []
-        for dag in dagbag.dags.values():
-            task_ids += dag.task_ids
-            if not dag.is_subdag:
-                dag_ids.append(dag.dag_id)
-
         TI = models.TaskInstance
         DagRun = models.DagRun
+        Dag = models.DagModel
         session = Session()
 
         LastDagRun = (
             session.query(DagRun.dag_id, sqla.func.max(DagRun.execution_date).label('execution_date'))
+            .join(Dag, Dag.dag_id == DagRun.dag_id)
             .filter(DagRun.state != State.RUNNING)
+            .filter(Dag.is_active == 1)
             .group_by(DagRun.dag_id)
             .subquery('last_dag_run')
         )
         RunningDagRun = (
             session.query(DagRun.dag_id, DagRun.execution_date)
+            .join(Dag, Dag.dag_id == DagRun.dag_id)
             .filter(DagRun.state == State.RUNNING)
+            .filter(Dag.is_active == 1)
             .subquery('running_dag_run')
         )
 
@@ -527,16 +525,12 @@ class Airflow(BaseView):
             .join(LastDagRun, and_(
                 LastDagRun.c.dag_id == TI.dag_id,
                 LastDagRun.c.execution_date == TI.execution_date))
-            .filter(TI.task_id.in_(task_ids))
-            .filter(TI.dag_id.in_(dag_ids))
         )
         RunningTI = (
             session.query(TI.dag_id.label('dag_id'), TI.state.label('state'))
             .join(RunningDagRun, and_(
                 RunningDagRun.c.dag_id == TI.dag_id,
                 RunningDagRun.c.execution_date == TI.execution_date))
-            .filter(TI.task_id.in_(task_ids))
-            .filter(TI.dag_id.in_(dag_ids))
         )
 
         UnionTI = union_all(LastTI, RunningTI).alias('union_ti')


[23/45] incubator-airflow git commit: [AIRFLOW-853] use utf8 encoding for stdout line decode

Posted by bo...@apache.org.
[AIRFLOW-853] use utf8 encoding for stdout line decode

Closes #2060 from ming-wu/master


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/10170085
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/10170085
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/10170085

Branch: refs/heads/v1-8-stable
Commit: 101700853896fdb90cda4267b5310e6c8811f4f0
Parents: 3918e5e
Author: Ming Wu <mi...@ubisoft.com>
Authored: Fri Feb 10 19:47:47 2017 -0500
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:10:12 2017 -0700

----------------------------------------------------------------------
 airflow/contrib/operators/ssh_execute_operator.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/10170085/airflow/contrib/operators/ssh_execute_operator.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/operators/ssh_execute_operator.py b/airflow/contrib/operators/ssh_execute_operator.py
index dd4c3b4..dd9e197 100644
--- a/airflow/contrib/operators/ssh_execute_operator.py
+++ b/airflow/contrib/operators/ssh_execute_operator.py
@@ -142,7 +142,7 @@ class SSHExecuteOperator(BaseOperator):
             logging.info("Output:")
             line = ''
             for line in iter(sp.stdout.readline, b''):
-                line = line.decode().strip()
+                line = line.decode('utf_8').strip()
                 logging.info(line)
             sp.wait()
             logging.info("Command exited with "


[04/45] incubator-airflow git commit: [AIRFLOW-814] Fix Presto*CheckOperator.__init__

Posted by bo...@apache.org.
[AIRFLOW-814] Fix Presto*CheckOperator.__init__

Use keyword args when initializing a
Presto*CheckOperator.

Closes #2029 from patrickmckenna/fix-presto-check-
operators

(cherry picked from commit d428a90286a8d34db65bb8f4d8252fbbe9665e55)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/adaebc2d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/adaebc2d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/adaebc2d

Branch: refs/heads/v1-8-stable
Commit: adaebc2d7afea4b996a0f49ee850bdb6dd6a0cfc
Parents: 0b47790
Author: Patrick McKenna <pa...@github.com>
Authored: Tue Feb 7 21:54:13 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Tue Feb 7 21:55:06 2017 +0100

----------------------------------------------------------------------
 airflow/operators/presto_check_operator.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/adaebc2d/airflow/operators/presto_check_operator.py
----------------------------------------------------------------------
diff --git a/airflow/operators/presto_check_operator.py b/airflow/operators/presto_check_operator.py
index 6dfcdec..e6e1fd8 100644
--- a/airflow/operators/presto_check_operator.py
+++ b/airflow/operators/presto_check_operator.py
@@ -80,7 +80,9 @@ class PrestoValueCheckOperator(ValueCheckOperator):
             self, sql, pass_value, tolerance=None,
             presto_conn_id='presto_default',
             *args, **kwargs):
-        super(PrestoValueCheckOperator, self).__init__(sql, pass_value, tolerance, *args, **kwargs)
+        super(PrestoValueCheckOperator, self).__init__(
+            sql=sql, pass_value=pass_value, tolerance=tolerance,
+            *args, **kwargs)
         self.presto_conn_id = presto_conn_id
 
     def get_db_hook(self):
@@ -110,7 +112,8 @@ class PrestoIntervalCheckOperator(IntervalCheckOperator):
             presto_conn_id='presto_default',
             *args, **kwargs):
         super(PrestoIntervalCheckOperator, self).__init__(
-            table, metrics_thresholds, date_filter_column, days_back,
+            table=table, metrics_thresholds=metrics_thresholds,
+            date_filter_column=date_filter_column, days_back=days_back,
             *args, **kwargs)
         self.presto_conn_id = presto_conn_id
 


[16/45] incubator-airflow git commit: [AIRFLOW-931] Do not set QUEUED in TaskInstances

Posted by bo...@apache.org.
[AIRFLOW-931] Do not set QUEUED in TaskInstances

The contract of TaskInstances stipulates that end
states for Tasks
can only be UP_FOR_RETRY, SUCCESS, FAILED,
UPSTREAM_FAILED or
SKIPPED. If concurrency was reached task instances
were set to
QUEUED by the task instance themselves. This would
prevent the
scheduler to pick them up again.

We set the state to NONE now, to ensure integrity.

Closes #2127 from bolkedebruin/AIRFLOW-931

(cherry picked from commit e42398100a3248eddb6b511ade73f6a239e58090)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/4db8f079
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/4db8f079
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/4db8f079

Branch: refs/heads/v1-8-stable
Commit: 4db8f0796642691255b0632d599f33cb9d0ce423
Parents: 3a5a323
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Thu Mar 9 08:32:46 2017 -0800
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Thu Mar 9 08:32:59 2017 -0800

----------------------------------------------------------------------
 airflow/models.py | 27 ++++++++++++++-------------
 tests/models.py   | 13 +++++++++++++
 2 files changed, 27 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4db8f079/airflow/models.py
----------------------------------------------------------------------
diff --git a/airflow/models.py b/airflow/models.py
index ba8d051..62457f0 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -1291,19 +1291,20 @@ class TaskInstance(Base):
             verbose=True)
 
         if not runnable and not mark_success:
-            if self.state != State.QUEUED:
-                # If a task's dependencies are met but it can't be run yet then queue it
-                # instead
-                self.state = State.QUEUED
-                msg = "Queuing attempt {attempt} of {total}".format(
-                    attempt=self.try_number % (task.retries + 1) + 1,
-                    total=task.retries + 1)
-                logging.info(hr + msg + hr)
-
-                self.queued_dttm = datetime.now()
-                msg = "Queuing into pool {}".format(self.pool)
-                logging.info(msg)
-                session.merge(self)
+            # FIXME: we might have hit concurrency limits, which means we probably
+            # have been running prematurely. This should be handled in the
+            # scheduling mechanism.
+            self.state = State.NONE
+            msg = ("FIXME: Rescheduling due to concurrency limits reached at task "
+                   "runtime. Attempt {attempt} of {total}. State set to NONE.").format(
+                attempt=self.try_number % (task.retries + 1) + 1,
+                total=task.retries + 1)
+            logging.warning(hr + msg + hr)
+
+            self.queued_dttm = datetime.now()
+            msg = "Queuing into pool {}".format(self.pool)
+            logging.info(msg)
+            session.merge(self)
             session.commit()
             return
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4db8f079/tests/models.py
----------------------------------------------------------------------
diff --git a/tests/models.py b/tests/models.py
index 868ea36..867e293 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -289,6 +289,19 @@ class TaskInstanceTest(unittest.TestCase):
         dag >> op5
         self.assertIs(op5.dag, dag)
 
+    @patch.object(DAG, 'concurrency_reached')
+    def test_requeue_over_concurrency(self, mock_concurrency_reached):
+        mock_concurrency_reached.return_value = True
+
+        dag = DAG(dag_id='test_requeue_over_concurrency', start_date=DEFAULT_DATE,
+                  max_active_runs=1, concurrency=2)
+        task = DummyOperator(task_id='test_requeue_over_concurrency_op', dag=dag)
+
+        ti = TI(task=task, execution_date=datetime.datetime.now())
+        ti.run()
+        self.assertEqual(ti.state, models.State.NONE)
+
+
     @patch.object(TI, 'pool_full')
     def test_run_pooling_task(self, mock_pool_full):
         """


[36/45] incubator-airflow git commit: [AIRFLOW-961] run onkill when SIGTERMed

Posted by bo...@apache.org.
[AIRFLOW-961] run onkill when SIGTERMed

Closes #2138 from saguziel/aguziel-sigterm


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/dacc69a5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/dacc69a5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/dacc69a5

Branch: refs/heads/v1-8-stable
Commit: dacc69a504cbfcdba5e2b24220fa1982637b17d3
Parents: dcc8ede
Author: Alex Guziel <al...@airbnb.com>
Authored: Sat Mar 11 10:43:49 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:34:09 2017 -0700

----------------------------------------------------------------------
 airflow/jobs.py | 8 ++++++++
 1 file changed, 8 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/dacc69a5/airflow/jobs.py
----------------------------------------------------------------------
diff --git a/airflow/jobs.py b/airflow/jobs.py
index b6913f3..36548c2 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -2060,6 +2060,14 @@ class LocalTaskJob(BaseJob):
 
     def _execute(self):
         self.task_runner = get_task_runner(self)
+
+        def signal_handler(signum, frame):
+            '''Setting kill signal handler'''
+            logging.error("Killing subprocess")
+            self.on_kill()
+            raise AirflowException("LocalTaskJob received SIGTERM signal")
+        signal.signal(signal.SIGTERM, signal_handler)
+
         try:
             self.task_runner.start()
 


[31/45] incubator-airflow git commit: [AIRFLOW-938] Use test for True in task_stats queries

Posted by bo...@apache.org.
[AIRFLOW-938] Use test for True in task_stats queries

Fix a bug with the task_stats query on postgres which doesn't support
== 1.

https://issues.apache.org/jira/browse/AIRFLOW-938

I've seen the other PR but I'll try to see if this
method works because I believe `__eq__(True)` is
just `== True`, and it is how it is down here http
://docs.sqlalchemy.org/en/latest/core/sqlelement.h
tml#sqlalchemy.sql.expression.and_ (underscore is
part of link)

Closes #2123 from saguziel/aguziel-fix-task-
stats-2


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/157054e2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/157054e2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/157054e2

Branch: refs/heads/v1-8-stable
Commit: 157054e2c9967e48fb3f3157081baf686dcee5e8
Parents: 66f39ca
Author: Alex Guziel <al...@airbnb.com>
Authored: Fri Mar 3 13:52:03 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:21:23 2017 -0700

----------------------------------------------------------------------
 airflow/www/views.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/157054e2/airflow/www/views.py
----------------------------------------------------------------------
diff --git a/airflow/www/views.py b/airflow/www/views.py
index d1a1f9a..962c1f0 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -506,7 +506,7 @@ class Airflow(BaseView):
             session.query(DagRun.dag_id, sqla.func.max(DagRun.execution_date).label('execution_date'))
             .join(Dag, Dag.dag_id == DagRun.dag_id)
             .filter(DagRun.state != State.RUNNING)
-            .filter(Dag.is_active == 1)
+            .filter(Dag.is_active == True)
             .group_by(DagRun.dag_id)
             .subquery('last_dag_run')
         )
@@ -514,7 +514,7 @@ class Airflow(BaseView):
             session.query(DagRun.dag_id, DagRun.execution_date)
             .join(Dag, Dag.dag_id == DagRun.dag_id)
             .filter(DagRun.state == State.RUNNING)
-            .filter(Dag.is_active == 1)
+            .filter(Dag.is_active == True)
             .subquery('running_dag_run')
         )
 


[12/45] incubator-airflow git commit: [AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer

Posted by bo...@apache.org.
[AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer

Testing Done:
- Added new unit tests for the S3ToHiveTransfer
module

Closes #2012 from krishnabhupatiraju/S3ToHiveTrans
fer_compress_loading

(cherry picked from commit ad15f5efd6c663bd5f0c8cd3f556d08182cc778c)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/1c231333
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/1c231333
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/1c231333

Branch: refs/heads/v1-8-stable
Commit: 1c2313338a586aae4a7752c3fb3b9de4e3564415
Parents: 3658bf3
Author: Krishna Bhupatiraju <kr...@airbnb.com>
Authored: Mon Feb 6 16:52:11 2017 -0800
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Sat Feb 18 15:56:37 2017 +0100

----------------------------------------------------------------------
 airflow/operators/s3_to_hive_operator.py | 151 ++++++++++++----
 airflow/utils/compression.py             |  38 ++++
 tests/operators/__init__.py              |   1 +
 tests/operators/s3_to_hive_operator.py   | 247 ++++++++++++++++++++++++++
 tests/utils/compression.py               |  97 ++++++++++
 5 files changed, 497 insertions(+), 37 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1c231333/airflow/operators/s3_to_hive_operator.py
----------------------------------------------------------------------
diff --git a/airflow/operators/s3_to_hive_operator.py b/airflow/operators/s3_to_hive_operator.py
index 3e01c29..92340f8 100644
--- a/airflow/operators/s3_to_hive_operator.py
+++ b/airflow/operators/s3_to_hive_operator.py
@@ -16,13 +16,18 @@ from builtins import next
 from builtins import zip
 import logging
 from tempfile import NamedTemporaryFile
+from airflow.utils.file import TemporaryDirectory
+import gzip
+import bz2
+import tempfile
+import os
 
 from airflow.exceptions import AirflowException
 from airflow.hooks.S3_hook import S3Hook
 from airflow.hooks.hive_hooks import HiveCliHook
 from airflow.models import BaseOperator
 from airflow.utils.decorators import apply_defaults
-
+from airflow.utils.compression import uncompress_file
 
 class S3ToHiveTransfer(BaseOperator):
     """
@@ -68,8 +73,11 @@ class S3ToHiveTransfer(BaseOperator):
     :type delimiter: str
     :param s3_conn_id: source s3 connection
     :type s3_conn_id: str
-    :param hive_conn_id: destination hive connection
-    :type hive_conn_id: str
+    :param hive_cli_conn_id: destination hive connection
+    :type hive_cli_conn_id: str
+    :param input_compressed: Boolean to determine if file decompression is
+        required to process headers
+    :type input_compressed: bool
     """
 
     template_fields = ('s3_key', 'partition', 'hive_table')
@@ -91,6 +99,7 @@ class S3ToHiveTransfer(BaseOperator):
             wildcard_match=False,
             s3_conn_id='s3_default',
             hive_cli_conn_id='hive_cli_default',
+            input_compressed=False,
             *args, **kwargs):
         super(S3ToHiveTransfer, self).__init__(*args, **kwargs)
         self.s3_key = s3_key
@@ -105,28 +114,41 @@ class S3ToHiveTransfer(BaseOperator):
         self.wildcard_match = wildcard_match
         self.hive_cli_conn_id = hive_cli_conn_id
         self.s3_conn_id = s3_conn_id
+        self.input_compressed = input_compressed
+
+        if (self.check_headers and
+                not (self.field_dict is not None and self.headers)):
+            raise AirflowException("To check_headers provide " +
+                                   "field_dict and headers")
 
     def execute(self, context):
-        self.hive = HiveCliHook(hive_cli_conn_id=self.hive_cli_conn_id)
+        # Downloading file from S3
         self.s3 = S3Hook(s3_conn_id=self.s3_conn_id)
+        self.hive = HiveCliHook(hive_cli_conn_id=self.hive_cli_conn_id)
         logging.info("Downloading S3 file")
+
         if self.wildcard_match:
             if not self.s3.check_for_wildcard_key(self.s3_key):
-                raise AirflowException("No key matches {0}".format(self.s3_key))
+                raise AirflowException("No key matches {0}"
+                                       .format(self.s3_key))
             s3_key_object = self.s3.get_wildcard_key(self.s3_key)
         else:
             if not self.s3.check_for_key(self.s3_key):
                 raise AirflowException(
                     "The key {0} does not exists".format(self.s3_key))
             s3_key_object = self.s3.get_key(self.s3_key)
-        with NamedTemporaryFile("w") as f:
+        root, file_ext = os.path.splitext(s3_key_object.key)
+        with TemporaryDirectory(prefix='tmps32hive_') as tmp_dir,\
+                NamedTemporaryFile(mode="w",
+                                   dir=tmp_dir,
+                                   suffix=file_ext) as f:
             logging.info("Dumping S3 key {0} contents to local"
                          " file {1}".format(s3_key_object.key, f.name))
             s3_key_object.get_contents_to_file(f)
             f.flush()
             self.s3.connection.close()
             if not self.headers:
-                logging.info("Loading file into Hive")
+                logging.info("Loading file {0} into Hive".format(f.name))
                 self.hive.load_file(
                     f.name,
                     self.hive_table,
@@ -136,33 +158,88 @@ class S3ToHiveTransfer(BaseOperator):
                     delimiter=self.delimiter,
                     recreate=self.recreate)
             else:
-                with open(f.name, 'r') as tmpf:
-                    if self.check_headers:
-                        header_l = tmpf.readline()
-                        header_line = header_l.rstrip()
-                        header_list = header_line.split(self.delimiter)
-                        field_names = list(self.field_dict.keys())
-                        test_field_match = [h1.lower() == h2.lower() for h1, h2
-                                            in zip(header_list, field_names)]
-                        if not all(test_field_match):
-                            logging.warning("Headers do not match field names"
-                                            "File headers:\n {header_list}\n"
-                                            "Field names: \n {field_names}\n"
-                                            "".format(**locals()))
-                            raise AirflowException("Headers do not match the "
-                                            "field_dict keys")
-                    with NamedTemporaryFile("w") as f_no_headers:
-                        tmpf.seek(0)
-                        next(tmpf)
-                        for line in tmpf:
-                            f_no_headers.write(line)
-                        f_no_headers.flush()
-                        logging.info("Loading file without headers into Hive")
-                        self.hive.load_file(
-                            f_no_headers.name,
-                            self.hive_table,
-                            field_dict=self.field_dict,
-                            create=self.create,
-                            partition=self.partition,
-                            delimiter=self.delimiter,
-                            recreate=self.recreate)
+                # Decompressing file
+                if self.input_compressed:
+                    logging.info("Uncompressing file {0}".format(f.name))
+                    fn_uncompressed = uncompress_file(f.name,
+                                                      file_ext,
+                                                      tmp_dir)
+                    logging.info("Uncompressed to {0}".format(fn_uncompressed))
+                    # uncompressed file available now so deleting
+                    # compressed file to save disk space
+                    f.close()
+                else:
+                    fn_uncompressed = f.name
+
+                # Testing if header matches field_dict
+                if self.check_headers:
+                    logging.info("Matching file header against field_dict")
+                    header_list = self._get_top_row_as_list(fn_uncompressed)
+                    if not self._match_headers(header_list):
+                        raise AirflowException("Header check failed")
+
+                # Deleting top header row
+                logging.info("Removing header from file {0}".
+                             format(fn_uncompressed))
+                headless_file = (
+                    self._delete_top_row_and_compress(fn_uncompressed,
+                                                      file_ext,
+                                                      tmp_dir))
+                logging.info("Headless file {0}".format(headless_file))
+                logging.info("Loading file {0} into Hive".format(headless_file))
+                self.hive.load_file(headless_file,
+                                    self.hive_table,
+                                    field_dict=self.field_dict,
+                                    create=self.create,
+                                    partition=self.partition,
+                                    delimiter=self.delimiter,
+                                    recreate=self.recreate)
+
+    def _get_top_row_as_list(self, file_name):
+        with open(file_name, 'rt') as f:
+            header_line = f.readline().strip()
+            header_list = header_line.split(self.delimiter)
+            return header_list
+
+    def _match_headers(self, header_list):
+        if not header_list:
+            raise AirflowException("Unable to retrieve header row from file")
+        field_names = self.field_dict.keys()
+        if len(field_names) != len(header_list):
+            logging.warning("Headers count mismatch"
+                            "File headers:\n {header_list}\n"
+                            "Field names: \n {field_names}\n"
+                            "".format(**locals()))
+            return False
+        test_field_match = [h1.lower() == h2.lower()
+                            for h1, h2 in zip(header_list, field_names)]
+        if not all(test_field_match):
+            logging.warning("Headers do not match field names"
+                            "File headers:\n {header_list}\n"
+                            "Field names: \n {field_names}\n"
+                            "".format(**locals()))
+            return False
+        else:
+            return True
+
+    def _delete_top_row_and_compress(
+            self,
+            input_file_name,
+            output_file_ext,
+            dest_dir):
+        # When output_file_ext is not defined, file is not compressed
+        open_fn = open
+        if output_file_ext.lower() == '.gz':
+            open_fn = gzip.GzipFile
+        elif output_file_ext.lower() == '.bz2':
+            open_fn = bz2.BZ2File
+
+        os_fh_output, fn_output = \
+            tempfile.mkstemp(suffix=output_file_ext, dir=dest_dir)
+        with open(input_file_name, 'rb') as f_in,\
+                open_fn(fn_output, 'wb') as f_out:
+            f_in.seek(0)
+            next(f_in)
+            for line in f_in:
+                f_out.write(line)
+        return fn_output

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1c231333/airflow/utils/compression.py
----------------------------------------------------------------------
diff --git a/airflow/utils/compression.py b/airflow/utils/compression.py
new file mode 100644
index 0000000..9d0785f
--- /dev/null
+++ b/airflow/utils/compression.py
@@ -0,0 +1,38 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from tempfile import NamedTemporaryFile
+import shutil
+import gzip
+import bz2
+
+
+def uncompress_file(input_file_name, file_extension, dest_dir):
+    """
+    Uncompress gz and bz2 files
+    """
+    if file_extension.lower() not in ('.gz', '.bz2'):
+        raise NotImplementedError("Received {} format. Only gz and bz2 "
+                                  "files can currently be uncompressed."
+                                  .format(file_extension))
+    if file_extension.lower() == '.gz':
+        fmodule = gzip.GzipFile
+    elif file_extension.lower() == '.bz2':
+        fmodule = bz2.BZ2File
+    with fmodule(input_file_name, mode='rb') as f_compressed,\
+        NamedTemporaryFile(dir=dest_dir,
+                           mode='wb',
+                           delete=False) as f_uncompressed:
+        shutil.copyfileobj(f_compressed, f_uncompressed)
+    return f_uncompressed.name

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1c231333/tests/operators/__init__.py
----------------------------------------------------------------------
diff --git a/tests/operators/__init__.py b/tests/operators/__init__.py
index 63ff2a0..1fb0e5e 100644
--- a/tests/operators/__init__.py
+++ b/tests/operators/__init__.py
@@ -17,3 +17,4 @@ from .subdag_operator import *
 from .operators import *
 from .sensors import *
 from .hive_operator import *
+from .s3_to_hive_operator import *

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1c231333/tests/operators/s3_to_hive_operator.py
----------------------------------------------------------------------
diff --git a/tests/operators/s3_to_hive_operator.py b/tests/operators/s3_to_hive_operator.py
new file mode 100644
index 0000000..faab11e
--- /dev/null
+++ b/tests/operators/s3_to_hive_operator.py
@@ -0,0 +1,247 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import unittest
+try:
+    from unittest import mock
+except ImportError:
+    try:
+        import mock
+    except ImportError:
+        mock = None
+import logging
+from itertools import product
+from airflow.operators.s3_to_hive_operator import S3ToHiveTransfer
+from collections import OrderedDict
+from airflow.exceptions import AirflowException
+from tempfile import NamedTemporaryFile, mkdtemp
+import gzip
+import bz2
+import shutil
+import filecmp
+import errno
+
+
+class S3ToHiveTransferTest(unittest.TestCase):
+
+    def setUp(self):
+        self.fn = {}
+        self.task_id = 'S3ToHiveTransferTest'
+        self.s3_key = 'S32hive_test_file'
+        self.field_dict = OrderedDict([('Sno', 'BIGINT'), ('Some,Text', 'STRING')])
+        self.hive_table = 'S32hive_test_table'
+        self.delimiter = '\t'
+        self.create = True
+        self.recreate = True
+        self.partition = {'ds': 'STRING'}
+        self.headers = True
+        self.check_headers = True
+        self.wildcard_match = False
+        self.input_compressed = False
+        self.kwargs = {'task_id': self.task_id,
+                       's3_key': self.s3_key,
+                       'field_dict': self.field_dict,
+                       'hive_table': self.hive_table,
+                       'delimiter': self.delimiter,
+                       'create': self.create,
+                       'recreate': self.recreate,
+                       'partition': self.partition,
+                       'headers': self.headers,
+                       'check_headers': self.check_headers,
+                       'wildcard_match': self.wildcard_match,
+                       'input_compressed': self.input_compressed
+                       }
+        try:
+            header = "Sno\tSome,Text \n".encode()
+            line1 = "1\tAirflow Test\n".encode()
+            line2 = "2\tS32HiveTransfer\n".encode()
+            self.tmp_dir = mkdtemp(prefix='test_tmps32hive_')
+            # create sample txt, gz and bz2 with and without headers
+            with NamedTemporaryFile(mode='wb+',
+                                    dir=self.tmp_dir,
+                                    delete=False) as f_txt_h:
+                self._set_fn(f_txt_h.name, '.txt', True)
+                f_txt_h.writelines([header, line1, line2])
+            fn_gz = self._get_fn('.txt', True) + ".gz"
+            with gzip.GzipFile(filename=fn_gz,
+                               mode="wb") as f_gz_h:
+                self._set_fn(fn_gz, '.gz', True)
+                f_gz_h.writelines([header, line1, line2])
+            fn_bz2 = self._get_fn('.txt', True) + '.bz2'
+            with bz2.BZ2File(filename=fn_bz2,
+                             mode="wb") as f_bz2_h:
+                self._set_fn(fn_bz2, '.bz2', True)
+                f_bz2_h.writelines([header, line1, line2])
+            # create sample txt, bz and bz2 without header
+            with NamedTemporaryFile(mode='wb+',
+                                    dir=self.tmp_dir,
+                                    delete=False) as f_txt_nh:
+                self._set_fn(f_txt_nh.name, '.txt', False)
+                f_txt_nh.writelines([line1, line2])
+            fn_gz = self._get_fn('.txt', False) + ".gz"
+            with gzip.GzipFile(filename=fn_gz,
+                               mode="wb") as f_gz_nh:
+                self._set_fn(fn_gz, '.gz', False)
+                f_gz_nh.writelines([line1, line2])
+            fn_bz2 = self._get_fn('.txt', False) + '.bz2'
+            with bz2.BZ2File(filename=fn_bz2,
+                             mode="wb") as f_bz2_nh:
+                self._set_fn(fn_bz2, '.bz2', False)
+                f_bz2_nh.writelines([line1, line2])
+        # Base Exception so it catches Keyboard Interrupt
+        except BaseException as e:
+            logging.error(e)
+            self.tearDown()
+
+    def tearDown(self):
+        try:
+            shutil.rmtree(self.tmp_dir)
+        except OSError as e:
+            # ENOENT - no such file or directory
+            if e.errno != errno.ENOENT:
+                raise e
+
+    # Helper method to create a dictionary of file names and
+    # file types (file extension and header)
+    def _set_fn(self, fn, ext, header):
+        key = self._get_key(ext, header)
+        self.fn[key] = fn
+
+    # Helper method to fetch a file of a
+    # certain format (file extension and header)
+    def _get_fn(self, ext, header):
+        key = self._get_key(ext, header)
+        return self.fn[key]
+
+    def _get_key(self, ext, header):
+        key = ext + "_" + ('h' if header else 'nh')
+        return key
+
+    def _cp_file_contents(self, fn_src, fn_dest):
+        with open(fn_src, 'rb') as f_src, open(fn_dest, 'wb') as f_dest:
+            shutil.copyfileobj(f_src, f_dest)
+
+    def _check_file_equality(self, fn_1, fn_2, ext):
+        # gz files contain mtime and filename in the header that
+        # causes filecmp to return False even if contents are identical
+        # Hence decompress to test for equality
+        if(ext == '.gz'):
+            with gzip.GzipFile(fn_1, 'rb') as f_1,\
+                 NamedTemporaryFile(mode='wb') as f_txt_1,\
+                 gzip.GzipFile(fn_2, 'rb') as f_2,\
+                 NamedTemporaryFile(mode='wb') as f_txt_2:
+                shutil.copyfileobj(f_1, f_txt_1)
+                shutil.copyfileobj(f_2, f_txt_2)
+                f_txt_1.flush()
+                f_txt_2.flush()
+                return filecmp.cmp(f_txt_1.name, f_txt_2.name, shallow=False)
+        else:
+            return filecmp.cmp(fn_1, fn_2, shallow=False)
+
+    def test_bad_parameters(self):
+        self.kwargs['check_headers'] = True
+        self.kwargs['headers'] = False
+        self.assertRaisesRegexp(AirflowException,
+                                "To check_headers.*",
+                                S3ToHiveTransfer,
+                                **self.kwargs)
+
+    def test__get_top_row_as_list(self):
+        self.kwargs['delimiter'] = '\t'
+        fn_txt = self._get_fn('.txt', True)
+        header_list = S3ToHiveTransfer(**self.kwargs).\
+            _get_top_row_as_list(fn_txt)
+        self.assertEqual(header_list, ['Sno', 'Some,Text'],
+                         msg="Top row from file doesnt matched expected value")
+
+        self.kwargs['delimiter'] = ','
+        header_list = S3ToHiveTransfer(**self.kwargs).\
+            _get_top_row_as_list(fn_txt)
+        self.assertEqual(header_list, ['Sno\tSome', 'Text'],
+                         msg="Top row from file doesnt matched expected value")
+
+    def test__match_headers(self):
+        self.kwargs['field_dict'] = OrderedDict([('Sno', 'BIGINT'),
+                                                ('Some,Text', 'STRING')])
+        self.assertTrue(S3ToHiveTransfer(**self.kwargs).
+                        _match_headers(['Sno', 'Some,Text']),
+                        msg="Header row doesnt match expected value")
+        # Testing with different column order
+        self.assertFalse(S3ToHiveTransfer(**self.kwargs).
+                         _match_headers(['Some,Text', 'Sno']),
+                         msg="Header row doesnt match expected value")
+        # Testing with extra column in header
+        self.assertFalse(S3ToHiveTransfer(**self.kwargs).
+                         _match_headers(['Sno', 'Some,Text', 'ExtraColumn']),
+                         msg="Header row doesnt match expected value")
+
+    def test__delete_top_row_and_compress(self):
+        s32hive = S3ToHiveTransfer(**self.kwargs)
+        # Testing gz file type
+        fn_txt = self._get_fn('.txt', True)
+        gz_txt_nh = s32hive._delete_top_row_and_compress(fn_txt,
+                                                         '.gz',
+                                                         self.tmp_dir)
+        fn_gz = self._get_fn('.gz', False)
+        self.assertTrue(self._check_file_equality(gz_txt_nh, fn_gz, '.gz'),
+                        msg="gz Compressed file not as expected")
+        # Testing bz2 file type
+        bz2_txt_nh = s32hive._delete_top_row_and_compress(fn_txt,
+                                                          '.bz2',
+                                                          self.tmp_dir)
+        fn_bz2 = self._get_fn('.bz2', False)
+        self.assertTrue(self._check_file_equality(bz2_txt_nh, fn_bz2, '.bz2'),
+                        msg="bz2 Compressed file not as expected")
+
+    @unittest.skipIf(mock is None, 'mock package not present')
+    @mock.patch('airflow.operators.s3_to_hive_operator.HiveCliHook')
+    @mock.patch('airflow.operators.s3_to_hive_operator.S3Hook')
+    def test_execute(self, mock_s3hook, mock_hiveclihook):
+        # Testing txt, zip, bz2 files with and without header row
+        for test in product(['.txt', '.gz', '.bz2'], [True, False]):
+            ext = test[0]
+            has_header = test[1]
+            self.kwargs['headers'] = has_header
+            self.kwargs['check_headers'] = has_header
+            logging.info("Testing {0} format {1} header".
+                         format(ext,
+                                ('with' if has_header else 'without'))
+                         )
+            self.kwargs['input_compressed'] = (False if ext == '.txt' else True)
+            self.kwargs['s3_key'] = self.s3_key + ext
+            ip_fn = self._get_fn(ext, self.kwargs['headers'])
+            op_fn = self._get_fn(ext, False)
+            # Mock s3 object returned by S3Hook
+            mock_s3_object = mock.Mock(key=self.kwargs['s3_key'])
+            mock_s3_object.get_contents_to_file.side_effect = \
+                lambda dest_file: \
+                self._cp_file_contents(ip_fn, dest_file.name)
+            mock_s3hook().get_key.return_value = mock_s3_object
+            # file paramter to HiveCliHook.load_file is compared
+            # against expected file oputput
+            mock_hiveclihook().load_file.side_effect = \
+                lambda *args, **kwargs: \
+                self.assertTrue(
+                    self._check_file_equality(args[0],
+                                              op_fn,
+                                              ext
+                                              ),
+                    msg='{0} output file not as expected'.format(ext))
+            # Execute S3ToHiveTransfer
+            s32hive = S3ToHiveTransfer(**self.kwargs)
+            s32hive.execute(None)
+
+
+if __name__ == '__main__':
+    unittest.main()

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1c231333/tests/utils/compression.py
----------------------------------------------------------------------
diff --git a/tests/utils/compression.py b/tests/utils/compression.py
new file mode 100644
index 0000000..f8e0ebb
--- /dev/null
+++ b/tests/utils/compression.py
@@ -0,0 +1,97 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from airflow.utils import compression
+import unittest
+from tempfile import NamedTemporaryFile, mkdtemp
+import bz2
+import gzip
+import shutil
+import logging
+import errno
+import filecmp
+
+
+class Compression(unittest.TestCase):
+
+    def setUp(self):
+        self.fn = {}
+        try:
+            header = "Sno\tSome,Text \n".encode()
+            line1 = "1\tAirflow Test\n".encode()
+            line2 = "2\tCompressionUtil\n".encode()
+            self.tmp_dir = mkdtemp(prefix='test_utils_compression_')
+            # create sample txt, gz and bz2 files
+            with NamedTemporaryFile(mode='wb+',
+                                    dir=self.tmp_dir,
+                                    delete=False) as f_txt:
+                self._set_fn(f_txt.name, '.txt')
+                f_txt.writelines([header, line1, line2])
+            fn_gz = self._get_fn('.txt') + ".gz"
+            with gzip.GzipFile(filename=fn_gz,
+                               mode="wb") as f_gz:
+                self._set_fn(fn_gz, '.gz')
+                f_gz.writelines([header, line1, line2])
+            fn_bz2 = self._get_fn('.txt') + '.bz2'
+            with bz2.BZ2File(filename=fn_bz2,
+                             mode="wb") as f_bz2:
+                self._set_fn(fn_bz2, '.bz2')
+                f_bz2.writelines([header, line1, line2])
+        # Base Exception so it catches Keyboard Interrupt
+        except BaseException as e:
+            logging.error(e)
+            self.tearDown()
+
+    def tearDown(self):
+        try:
+            shutil.rmtree(self.tmp_dir)
+        except OSError as e:
+            # ENOENT - no such file or directory
+            if e.errno != errno.ENOENT:
+                raise e
+
+    # Helper method to create a dictionary of file names and
+    # file extension
+    def _set_fn(self, fn, ext):
+        self.fn[ext] = fn
+
+    # Helper method to fetch a file of a
+    # certain extension
+    def _get_fn(self, ext):
+        return self.fn[ext]
+
+    def test_uncompress_file(self):
+        # Testing txt file type
+        self.assertRaisesRegexp(NotImplementedError,
+                                "^Received .txt format. Only gz and bz2.*",
+                                compression.uncompress_file,
+                                **{'input_file_name': None,
+                                   'file_extension': '.txt',
+                                   'dest_dir': None
+                                   })
+        # Testing gz file type
+        fn_txt = self._get_fn('.txt')
+        fn_gz = self._get_fn('.gz')
+        txt_gz = compression.uncompress_file(fn_gz, '.gz', self.tmp_dir)
+        self.assertTrue(filecmp.cmp(txt_gz, fn_txt, shallow=False),
+                        msg="Uncompressed file doest match original")
+        # Testing bz2 file type
+        fn_bz2 = self._get_fn('.bz2')
+        txt_bz2 = compression.uncompress_file(fn_bz2, '.bz2', self.tmp_dir)
+        self.assertTrue(filecmp.cmp(txt_bz2, fn_txt, shallow=False),
+                        msg="Uncompressed file doest match original")
+
+
+if __name__ == '__main__':
+    unittest.main()


[27/45] incubator-airflow git commit: [AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs UI

Posted by bo...@apache.org.
[AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs UI

Please accept this PR that addresses the following
issues:
-
https://issues.apache.org/jira/browse/AIRFLOW-919

I also made the airflow PR template a little bit
less verbose (requires less edits when creating a
PR).

Testing Done:
- Ran a webserver with this case and made sure
that the DAG page loaded

Closes #2110 from
aoen/ddavydov/fix_running_task_with_no_start_date


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/ab37f8d3
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/ab37f8d3
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/ab37f8d3

Branch: refs/heads/v1-8-stable
Commit: ab37f8d32ef9dcf3163a037b53ca749f2f99f22e
Parents: 01494fd
Author: Dan Davydov <da...@airbnb.com>
Authored: Mon Feb 27 13:43:25 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:20:00 2017 -0700

----------------------------------------------------------------------
 .github/PULL_REQUEST_TEMPLATE.md | 6 +-----
 airflow/www/views.py             | 3 ++-
 2 files changed, 3 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ab37f8d3/.github/PULL_REQUEST_TEMPLATE.md
----------------------------------------------------------------------
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index 5681a89..b92e29a 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,9 +1,5 @@
-Dear Airflow Maintainers,
-
 Please accept this PR that addresses the following issues:
-- *(replace with a link to AIRFLOW-X)*
-
-Per Apache guidelines you need to create a [Jira issue](https://issues.apache.org/jira/browse/AIRFLOW/).
+- *(MANDATORY - replace with a link to JIRA - e.g. https://issues.apache.org/jira/browse/AIRFLOW-XXX)*
 
 Testing Done:
 - Unittests are required, if you do not include new unit tests please

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ab37f8d3/airflow/www/views.py
----------------------------------------------------------------------
diff --git a/airflow/www/views.py b/airflow/www/views.py
index bda4921..86b1291 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -1205,7 +1205,8 @@ class Airflow(BaseView):
                 children_key = "_children"
 
             def set_duration(tid):
-                if isinstance(tid, dict) and tid.get("state") == State.RUNNING:
+                if (isinstance(tid, dict) and tid.get("state") == State.RUNNING and
+                        tid["start_date"] is not None):
                     d = datetime.now() - dateutil.parser.parse(tid["start_date"])
                     tid["duration"] = d.total_seconds()
                 return tid


[25/45] incubator-airflow git commit: [AIRFLOW-897] Prevent dagruns from failing with unfinished tasks

Posted by bo...@apache.org.
[AIRFLOW-897] Prevent dagruns from failing with unfinished tasks

Closes #2099 from
aoen/ddavydov/fix_premature_dagrun_failures


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/c29af466
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/c29af466
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/c29af466

Branch: refs/heads/v1-8-stable
Commit: c29af4668a67b5d7f969140549558714fb7b32c9
Parents: ff0fa00
Author: Dan Davydov <da...@airbnb.com>
Authored: Fri Feb 24 14:29:11 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:17:40 2017 -0700

----------------------------------------------------------------------
 airflow/models.py             |  6 +++---
 tests/dags/test_issue_1225.py | 13 +++++++++++++
 tests/jobs.py                 | 24 ++++++++++++++++++++++++
 3 files changed, 40 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/c29af466/airflow/models.py
----------------------------------------------------------------------
diff --git a/airflow/models.py b/airflow/models.py
index 1829ff3..3fef407 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -3993,12 +3993,12 @@ class DagRun(Base):
 
         # future: remove the check on adhoc tasks (=active_tasks)
         if len(tis) == len(dag.active_tasks):
-            # if any roots failed, the run failed
             root_ids = [t.task_id for t in dag.roots]
             roots = [t for t in tis if t.task_id in root_ids]
 
-            if any(r.state in (State.FAILED, State.UPSTREAM_FAILED)
-                   for r in roots):
+            # if all roots finished and at least on failed, the run failed
+            if (not unfinished_tasks and
+                    any(r.state in (State.FAILED, State.UPSTREAM_FAILED) for r in roots)):
                 logging.info('Marking run {} failed'.format(self))
                 self.state = State.FAILED
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/c29af466/tests/dags/test_issue_1225.py
----------------------------------------------------------------------
diff --git a/tests/dags/test_issue_1225.py b/tests/dags/test_issue_1225.py
index 021561f..d01fd79 100644
--- a/tests/dags/test_issue_1225.py
+++ b/tests/dags/test_issue_1225.py
@@ -129,3 +129,16 @@ dag7_subdag1 = SubDagOperator(
     subdag=subdag7)
 subdag7_task1.set_downstream(subdag7_task2)
 subdag7_task2.set_downstream(subdag7_task3)
+
+# DAG tests that a Dag run that doesn't complete but has a root failure is marked running
+dag8 = DAG(dag_id='test_dagrun_states_root_fail_unfinished', default_args=default_args)
+dag8_task1 = DummyOperator(
+    task_id='test_dagrun_unfinished',  # The test will unset the task instance state after
+                                       # running this test
+    dag=dag8,
+)
+dag8_task2 = PythonOperator(
+    task_id='test_dagrun_fail',
+    dag=dag8,
+    python_callable=fail,
+)

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/c29af466/tests/jobs.py
----------------------------------------------------------------------
diff --git a/tests/jobs.py b/tests/jobs.py
index e520b44..1f7950e 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -358,6 +358,30 @@ class SchedulerJobTest(unittest.TestCase):
             },
             dagrun_state=State.FAILED)
 
+    def test_dagrun_root_fail_unfinished(self):
+        """
+        DagRuns with one unfinished and one failed root task -> RUNNING
+        """
+        # Run both the failed and successful tasks
+        scheduler = SchedulerJob(**self.default_scheduler_args)
+        dag_id = 'test_dagrun_states_root_fail_unfinished'
+        dag = self.dagbag.get_dag(dag_id)
+        dag.clear()
+        dr = scheduler.create_dag_run(dag)
+        try:
+            dag.run(start_date=dr.execution_date, end_date=dr.execution_date)
+        except AirflowException:  # Expect an exception since there is a failed task
+            pass
+
+        # Mark the successful task as never having run since we want to see if the
+        # dagrun will be in a running state despite haveing an unfinished task.
+        session = settings.Session()
+        ti = dr.get_task_instance('test_dagrun_unfinished', session=session)
+        ti.state = State.NONE
+        session.commit()
+        dr_state = dr.update_state()
+        self.assertEqual(dr_state, State.RUNNING)
+
     def test_dagrun_deadlock_ignore_depends_on_past_advance_ex_date(self):
         """
         DagRun is marked a success if ignore_first_depends_on_past=True


[07/45] incubator-airflow git commit: Merge branch 'v1-8-test' of https://git-wip-us.apache.org/repos/asf/incubator-airflow into v1-8-test

Posted by bo...@apache.org.
Merge branch 'v1-8-test' of https://git-wip-us.apache.org/repos/asf/incubator-airflow into v1-8-test


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/7925bed6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/7925bed6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/7925bed6

Branch: refs/heads/v1-8-stable
Commit: 7925bed63991da78cc63909a005d3dd9abd813ac
Parents: b3d4e71 fb88c2d
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Fri Feb 10 14:54:03 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Fri Feb 10 14:54:03 2017 +0100

----------------------------------------------------------------------
 airflow/api/client/local_client.py             |   2 +-
 airflow/api/common/experimental/trigger_dag.py |   9 +-
 tests/__init__.py                              |   1 +
 tests/api/__init__.py                          |  17 ++++
 tests/api/client/__init__.py                   |  13 +++
 tests/api/client/local_client.py               | 107 ++++++++++++++++++++
 6 files changed, 144 insertions(+), 5 deletions(-)
----------------------------------------------------------------------



[37/45] incubator-airflow git commit: AIRFLOW-932][AIRFLOW-932][AIRFLOW-921][AIRFLOW-910] Do not mark tasks removed when backfilling[

Posted by bo...@apache.org.
AIRFLOW-932][AIRFLOW-932][AIRFLOW-921][AIRFLOW-910] Do not mark tasks removed when backfilling[

In a backfill one can specify a specific task to
execute. We
create a subset of the orginal tasks in a subdag
from the original dag.
The subdag has the same name as the original dag.
This breaks
the integrity check of a dag_run as tasks are
suddenly not in
scope any more.

Closes #2122 from bolkedebruin/AIRFLOW-921


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a8f2c27e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a8f2c27e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a8f2c27e

Branch: refs/heads/v1-8-stable
Commit: a8f2c27ed44449e6611c7c4a9ec8cf2371cf0987
Parents: dacc69a
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Sat Mar 11 10:52:07 2017 -0800
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 08:34:22 2017 -0700

----------------------------------------------------------------------
 airflow/jobs.py   |  1 +
 airflow/models.py | 12 +++++++++++-
 tests/jobs.py     | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a8f2c27e/airflow/jobs.py
----------------------------------------------------------------------
diff --git a/airflow/jobs.py b/airflow/jobs.py
index 36548c2..c61b229 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -1803,6 +1803,7 @@ class BackfillJob(BaseJob):
 
             # explictely mark running as we can fill gaps
             run.state = State.RUNNING
+            run.run_id = run_id
             run.verify_integrity(session=session)
 
             # check if we have orphaned tasks

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a8f2c27e/airflow/models.py
----------------------------------------------------------------------
diff --git a/airflow/models.py b/airflow/models.py
index e63da3e..32c52ac 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -2681,6 +2681,8 @@ class DAG(BaseDag, LoggingMixin):
         self.orientation = orientation
         self.catchup = catchup
 
+        self.partial = False
+
         self._comps = {
             'dag_id',
             'task_ids',
@@ -3186,6 +3188,10 @@ class DAG(BaseDag, LoggingMixin):
                 tid for tid in t._upstream_task_ids if tid in dag.task_ids]
             t._downstream_task_ids = [
                 tid for tid in t._downstream_task_ids if tid in dag.task_ids]
+
+        if len(dag.tasks) < len(self.tasks):
+            dag.partial = True
+
         return dag
 
     def has_task(self, task_id):
@@ -3946,6 +3952,9 @@ class DagRun(Base):
                 else:
                     tis = tis.filter(TI.state.in_(state))
 
+        if self.dag and self.dag.partial:
+            tis = tis.filter(TI.task_id.in_(self.dag.task_ids))
+
         return tis.all()
 
     @provide_session
@@ -4006,6 +4015,7 @@ class DagRun(Base):
         """
 
         dag = self.get_dag()
+
         tis = self.get_task_instances(session=session)
 
         logging.info("Updating state for {} considering {} task(s)"
@@ -4090,7 +4100,7 @@ class DagRun(Base):
             try:
                 dag.get_task(ti.task_id)
             except AirflowException:
-                if self.state is not State.RUNNING:
+                if self.state is not State.RUNNING and not dag.partial:
                     ti.state = State.REMOVED
 
         # check for missing tasks

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a8f2c27e/tests/jobs.py
----------------------------------------------------------------------
diff --git a/tests/jobs.py b/tests/jobs.py
index 1acf269..d208fd4 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -42,6 +42,8 @@ from tests.executor.test_executor import TestExecutor
 from airflow import configuration
 configuration.load_test_config()
 
+import sqlalchemy
+
 try:
     from unittest import mock
 except ImportError:
@@ -294,6 +296,53 @@ class BackfillJobTest(unittest.TestCase):
         self.assertEqual(ti.state, State.SUCCESS)
         dag.clear()
 
+    def test_sub_set_subdag(self):
+        dag = DAG(
+            'test_sub_set_subdag',
+            start_date=DEFAULT_DATE,
+            default_args={'owner': 'owner1'})
+
+        with dag:
+            op1 = DummyOperator(task_id='leave1')
+            op2 = DummyOperator(task_id='leave2')
+            op3 = DummyOperator(task_id='upstream_level_1')
+            op4 = DummyOperator(task_id='upstream_level_2')
+            op5 = DummyOperator(task_id='upstream_level_3')
+            # order randomly
+            op2.set_downstream(op3)
+            op1.set_downstream(op3)
+            op4.set_downstream(op5)
+            op3.set_downstream(op4)
+
+        dag.clear()
+        dr = dag.create_dagrun(run_id="test",
+                               state=State.SUCCESS,
+                               execution_date=DEFAULT_DATE,
+                               start_date=DEFAULT_DATE)
+
+        executor = TestExecutor(do_update=True)
+        sub_dag = dag.sub_dag(task_regex="leave*",
+                              include_downstream=False,
+                              include_upstream=False)
+        job = BackfillJob(dag=sub_dag,
+                          start_date=DEFAULT_DATE,
+                          end_date=DEFAULT_DATE,
+                          executor=executor)
+        job.run()
+
+        self.assertRaises(sqlalchemy.orm.exc.NoResultFound, dr.refresh_from_db)
+        # the run_id should have changed, so a refresh won't work
+        drs = DagRun.find(dag_id=dag.dag_id, execution_date=DEFAULT_DATE)
+        dr = drs[0]
+
+        self.assertEqual(BackfillJob.ID_FORMAT_PREFIX.format(DEFAULT_DATE.isoformat()),
+                         dr.run_id)
+        for ti in dr.get_task_instances():
+            if ti.task_id == 'leave1' or ti.task_id == 'leave2':
+                self.assertEqual(State.SUCCESS, ti.state)
+            else:
+                self.assertEqual(State.NONE, ti.state)
+
 
 class SchedulerJobTest(unittest.TestCase):
     # These defaults make the test faster to run


[03/45] incubator-airflow git commit: [AIRFLOW-844] Fix cgroups directory creation

Posted by bo...@apache.org.
[AIRFLOW-844] Fix cgroups directory creation

Testing Done:
- Tested locally, we should add cgroup tests at
some point though

Closes #2057 from aoen/ddavydov/fix_cgroups


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/0b477900
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/0b477900
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/0b477900

Branch: refs/heads/v1-8-stable
Commit: 0b477900021e69f6a0ae8b5dd42b1465e9f836c5
Parents: ce3f88b
Author: Dan Davydov <da...@airbnb.com>
Authored: Mon Feb 6 16:21:05 2017 -0800
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Tue Feb 7 21:49:23 2017 +0100

----------------------------------------------------------------------
 airflow/contrib/task_runner/cgroup_task_runner.py | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/0b477900/airflow/contrib/task_runner/cgroup_task_runner.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/task_runner/cgroup_task_runner.py b/airflow/contrib/task_runner/cgroup_task_runner.py
index 79aafc8..6a9e6cf 100644
--- a/airflow/contrib/task_runner/cgroup_task_runner.py
+++ b/airflow/contrib/task_runner/cgroup_task_runner.py
@@ -75,14 +75,12 @@ class CgroupTaskRunner(BaseTaskRunner):
             if path_element not in name_to_node:
                 self.logger.debug("Creating cgroup {} in {}"
                                   .format(path_element, node.path))
-                subprocess.check_output("sudo mkdir -p {}".format(path_element))
-                subprocess.check_output("sudo chown -R {} {}".format(
-                    self._cur_user, path_element))
+                node = node.create_cgroup(path_element)
             else:
                 self.logger.debug("Not creating cgroup {} in {} "
                                   "since it already exists"
                                   .format(path_element, node.path))
-            node = name_to_node[path_element]
+                node = name_to_node[path_element]
         return node
 
     def _delete_cgroup(self, path):


[42/45] incubator-airflow git commit: Remove remnants

Posted by bo...@apache.org.
Remove remnants


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3927e00d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3927e00d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3927e00d

Branch: refs/heads/v1-8-stable
Commit: 3927e00dc72f6f2d14e463ff8daba3e3bcb11b73
Parents: 8df046b
Author: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Authored: Sun Mar 12 10:33:49 2017 -0700
Committer: Bolke de Bruin <bo...@Bolkes-MacBook-Pro.local>
Committed: Sun Mar 12 10:33:49 2017 -0700

----------------------------------------------------------------------
 tests/executor/__init__.py      | 13 ---------
 tests/executor/test_executor.py | 56 ------------------------------------
 2 files changed, 69 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3927e00d/tests/executor/__init__.py
----------------------------------------------------------------------
diff --git a/tests/executor/__init__.py b/tests/executor/__init__.py
deleted file mode 100644
index a85b772..0000000
--- a/tests/executor/__init__.py
+++ /dev/null
@@ -1,13 +0,0 @@
-# -*- coding: utf-8 -*-
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3927e00d/tests/executor/test_executor.py
----------------------------------------------------------------------
diff --git a/tests/executor/test_executor.py b/tests/executor/test_executor.py
deleted file mode 100644
index 9ec6cd4..0000000
--- a/tests/executor/test_executor.py
+++ /dev/null
@@ -1,56 +0,0 @@
-# -*- coding: utf-8 -*-
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-from airflow.executors.base_executor import BaseExecutor
-from airflow.utils.state import State
-
-from airflow import settings
-
-
-class TestExecutor(BaseExecutor):
-    """
-    TestExecutor is used for unit testing purposes.
-    """
-    def __init__(self, do_update=False, *args, **kwargs):
-        self.do_update = do_update
-        self._running = []
-        self.history = []
-
-        super(TestExecutor, self).__init__(*args, **kwargs)
-
-    def execute_async(self, key, command, queue=None):
-        self.logger.debug("{} running task instances".format(len(self.running)))
-        self.logger.debug("{} in queue".format(len(self.queued_tasks)))
-
-    def heartbeat(self):
-        session = settings.Session()
-        if self.do_update:
-            self.history.append(list(self.queued_tasks.values()))
-            while len(self._running) > 0:
-                ti = self._running.pop()
-                ti.set_state(State.SUCCESS, session)
-            for key, val in list(self.queued_tasks.items()):
-                (command, priority, queue, ti) = val
-                ti.set_state(State.RUNNING, session)
-                self._running.append(ti)
-                self.queued_tasks.pop(key)
-
-        session.commit()
-        session.close()
-
-    def terminate(self):
-        pass
-
-    def end(self):
-        self.sync()
-


[15/45] incubator-airflow git commit: [AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI instead of black

Posted by bo...@apache.org.
[AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI instead of black

Closes #2100 from
aoen/ddavydov/fix_black_squares_in_ui

(cherry picked from commit daa405e2bd2e4d3538eea0ed951fdcdf6d8bc127)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3a5a3235
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3a5a3235
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3a5a3235

Branch: refs/heads/v1-8-stable
Commit: 3a5a3235d5ad77a116ea1ac2a3216af31900d703
Parents: 8ad9ab6
Author: Dan Davydov <da...@airbnb.com>
Authored: Thu Feb 23 23:50:19 2017 +0100
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Thu Feb 23 23:50:34 2017 +0100

----------------------------------------------------------------------
 airflow/www/static/tree.css | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3a5a3235/airflow/www/static/tree.css
----------------------------------------------------------------------
diff --git a/airflow/www/static/tree.css b/airflow/www/static/tree.css
index 1818250..9304bb1 100644
--- a/airflow/www/static/tree.css
+++ b/airflow/www/static/tree.css
@@ -38,7 +38,7 @@ rect.state {
     shape-rendering: crispEdges;
     cursor: pointer;
 }
-rect.null, rect.undefined {
+rect.null, rect.scheduled, rect.undefined {
     fill: white;
 }
 rect.success {