You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by bo...@apache.org on 2017/02/01 15:55:23 UTC

[13/14] incubator-airflow git commit: [AIRFLOW-789] Update UPDATING.md

[AIRFLOW-789] Update UPDATING.md

Closes #2011 from bolkedebruin/AIRFLOW-789


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/c6483271
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/c6483271
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/c6483271

Branch: refs/heads/v1-8-test
Commit: c64832718bd1cf3ca772b7bb9c61b51a2e27a12b
Parents: 1accb54
Author: Bolke de Bruin <bo...@xs4all.nl>
Authored: Wed Feb 1 15:52:45 2017 +0000
Committer: Bolke de Bruin <bo...@xs4all.nl>
Committed: Wed Feb 1 15:52:50 2017 +0000

----------------------------------------------------------------------
 UPDATING.md        | 83 +++++++++++++++++++++++++++++++++++++++++++------
 airflow/version.py |  2 +-
 2 files changed, 75 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/c6483271/UPDATING.md
----------------------------------------------------------------------
diff --git a/UPDATING.md b/UPDATING.md
index e23fd57..337b711 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -3,12 +3,60 @@
 This file documents any backwards-incompatible changes in Airflow and
 assists people when migrating to a new version.
 
-
 ## Airflow 1.8
 
-### Changes to Behavior
+### Database
+The database schema needs to be upgraded. Make sure to shutdown Airflow and make a backup of your database. To
+upgrade the schema issue `airflow upgradedb`.
+
+### Upgrade systemd unit files
+Systemd unit files have been updated. If you use systemd please make sure to update these.
+
+> Please note that the webserver does not detach properly, this will be fixed in a future version.
+
+### Less forgiving scheduler on dynamic start_date
+Using a dynamic start_date (e.g. `start_date = datetime.now()`) is not considered a best practice. The 1.8.0 scheduler
+is less forgiving in this area. If you encounter DAGs not being scheduled you can try using a fixed start_date and
+renaming your dag. The last step is required to make sure you start with a clean slate, otherwise the old schedule can
+interfere.
+
+### New and updated scheduler options
+Please read through these options, defaults have changed since 1.7.1.
+
+#### child_process_log_directory
+In order the increase the robustness of the scheduler, DAGS our now processed in their own process. Therefore each 
+DAG has its own log file for the scheduler. These are placed in `child_process_log_directory` which defaults to 
+`<AIRFLOW_HOME>/scheduler/latest`. You will need to make sure these log files are removed.
+
+> DAG logs or processor logs ignore and command line settings for log file locations.
+
+#### run_duration
+Previously the command line option `num_runs` was used to let the scheduler terminate after a certain amount of
+loops. This is now time bound and defaults to `-1`, which means run continuously. See also num_runs.
+
+#### num_runs
+Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies 
+the number of times to try to schedule each DAG file within `run_duration` time. Defaults to `-1`, which means try
+indefinitely.
 
-#### New DAGs are paused by default
+#### min_file_process_interval
+After how much time should an updated DAG be picked up from the filesystem.
+
+#### dag_dir_list_interval
+How often the scheduler should relist the contents of the DAG directory. If you experience that while developing your
+dags are not being picked up, have a look at this number and decrease it when necessary.
+
+#### catchup_by_default
+By default the scheduler will fill any missing interval DAG Runs between the last execution date and the current date.
+This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as 
+`catchup = False / True`. Command line backfills will still work.
+
+### Faulty Dags do not show an error in the Web UI
+
+Due to changes in the way Airflow processes DAGs the Web UI does not show an error when processing a faulty DAG. To
+find processing errors go the `child_process_log_directory` which defaults to `<AIRFLOW_HOME>/scheduler/latest`.
+
+### New DAGs are paused by default
 
 Previously, new DAGs would be scheduled immediately. To retain the old behavior, add this to airflow.cfg:
 
@@ -17,24 +65,41 @@ Previously, new DAGs would be scheduled immediately. To retain the old behavior,
 dags_are_paused_at_creation = False
 ```
 
-#### Google Cloud Operator and Hook alignment
+### Airflow Context variable are passed to Hive config if conf is specified
+
+If you specify a hive conf to the run_cli command of the HiveHook, Airflow add some
+convenience variables to the config. In case your run a sceure Hadoop setup it might be
+required to whitelist these variables by adding the following to your configuration:
+
+```
+<property> 
+     <name>hive.security.authorization.sqlstd.confwhitelist.append</name>
+     <value>airflow\.ctx\..*</value>
+</property>
+```
+### Google Cloud Operator and Hook alignment
 
-All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection type for all kinds of Google Cloud Operators.
+All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection 
+type for all kinds of Google Cloud Operators.
 
 If you experience problems connecting with your operator make sure you set the connection type "Google Cloud Platform".
 
-Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service account.
+Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service 
+account.
 
 ### Deprecated Features
-These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer supported and will be removed entirely in Airflow 2.0
+These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer 
+supported and will be removed entirely in Airflow 2.0
 
 - Hooks and operators must be imported from their respective submodules
 
-  `airflow.operators.PigOperator` is no longer supported; `from airflow.operators.pig_operator import PigOperator` is. (AIRFLOW-31, AIRFLOW-200)
+  `airflow.operators.PigOperator` is no longer supported; `from airflow.operators.pig_operator import PigOperator` is. 
+  (AIRFLOW-31, AIRFLOW-200)
 
 - Operators no longer accept arbitrary arguments
 
-  Previously, `Operator.__init__()` accepted any arguments (either positional `*args` or keyword `**kwargs`) without complaint. Now, invalid arguments will be rejected. (https://github.com/apache/incubator-airflow/pull/1285)
+  Previously, `Operator.__init__()` accepted any arguments (either positional `*args` or keyword `**kwargs`) without 
+  complaint. Now, invalid arguments will be rejected. (https://github.com/apache/incubator-airflow/pull/1285)
 
 ## Airflow 1.7.1.2
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/c6483271/airflow/version.py
----------------------------------------------------------------------
diff --git a/airflow/version.py b/airflow/version.py
index 0b45eae..b50d056 100644
--- a/airflow/version.py
+++ b/airflow/version.py
@@ -13,4 +13,4 @@
 # limitations under the License.
 #
 
-version = '1.8.0b1+apache.incubating'
+version = '1.8.0b3+apache.incubating'