You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@falcon.apache.org by aj...@apache.org on 2015/07/16 08:37:58 UTC

falcon git commit: FALCON-1204 Expose default configs for feed late data handling in runtime.properties. Contributed by Balu Vellanki.

Repository: falcon
Updated Branches:
  refs/heads/master 9066eac27 -> 09841bbea


FALCON-1204 Expose default configs for feed late data handling in runtime.properties. Contributed by Balu Vellanki.


Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/09841bbe
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/09841bbe
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/09841bbe

Branch: refs/heads/master
Commit: 09841bbeab843df681f70ca21eb1c856507149c2
Parents: 9066eac
Author: Ajay Yadava <aj...@gmail.com>
Authored: Thu Jul 16 12:06:48 2015 +0530
Committer: Ajay Yadava <aj...@gmail.com>
Committed: Thu Jul 16 12:06:48 2015 +0530

----------------------------------------------------------------------
 CHANGES.txt                                   |  2 ++
 common/src/main/resources/runtime.properties  |  7 ++++++-
 docs/src/site/twiki/FalconDocumentation.twiki | 12 +++++++++++-
 src/conf/runtime.properties                   | 11 +++++++++--
 4 files changed, 28 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index 8b96e78..63298f0 100755
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -9,6 +9,8 @@ Trunk (Unreleased)
     FALCON-796 Enable users to triage data processing issues through falcon (Ajay Yadava)
     
   IMPROVEMENTS
+    FALCON-1204 Expose default configs for feed late data handling in runtime.properties(Balu Vellanki via Ajay Yadava)
+
     FALCON-1170 Falcon Native Scheduler - Refactor existing workflow/coord/bundle builder(Pallavi Rao via Ajay Yadava)
     
     FALCON-1031 Make post processing notifications to user topics optional (Pallavi Rao via Ajay Yadava)

http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/common/src/main/resources/runtime.properties
----------------------------------------------------------------------
diff --git a/common/src/main/resources/runtime.properties b/common/src/main/resources/runtime.properties
index 8d465e8..3b32463 100644
--- a/common/src/main/resources/runtime.properties
+++ b/common/src/main/resources/runtime.properties
@@ -23,4 +23,9 @@
 
 *.falcon.replication.workflow.maxmaps=5
 *.falcon.replication.workflow.mapbandwidth=100
-webservices.default.max.results.per.page=100
+*.webservices.default.max.results.per.page=100
+
+# Default configs to handle replication for late arriving feeds.
+*.feed.late.allowed=true
+*.feed.late.frequency=hours(3)
+*.feed.late.policy=exp-backoff
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/docs/src/site/twiki/FalconDocumentation.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/FalconDocumentation.twiki b/docs/src/site/twiki/FalconDocumentation.twiki
index c374966..9804a57 100644
--- a/docs/src/site/twiki/FalconDocumentation.twiki
+++ b/docs/src/site/twiki/FalconDocumentation.twiki
@@ -561,7 +561,7 @@ simple and basic. The falcon system looks at all dependent input feeds for a pro
 cut-off period. Then it uses a scheduled messaging framework, like the one available in Apache ActiveMQ or Java's !DelayQueue to schedule a message with a cut-off period, then after a cut-off period the message is dequeued and Falcon checks for changes in the feed data which is recorded in HDFS in latedata file by falcons "record-size" action, if it detects any changes then the workflow will be rerun with the new set of feed data.
 
 *Example:*
-The late rerun policy can be configured in the process definition.
+For a process entity, the late rerun policy can be configured in the process definition.
 Falcon supports 3 policies, periodic, exp-backoff and final.
 Delay specifies, how often the feed data should be checked for changes, also one needs to 
 explicitly set the feed names in late-input which needs to be checked for late data.
@@ -575,6 +575,16 @@ explicitly set the feed names in late-input which needs to be checked for late d
 *NOTE:* Feeds configured with table storage does not support late input data handling at this point. This will be
 made available in the near future.
 
+For a feed entity replication job, the default late data handling policy can be configured in the runtime.properties file.
+Since these properties are runtime.properties, they will take effect for all replication jobs completed subsequent to the change.
+<verbatim>
+  # Default configs to handle replication for late arriving feeds.
+  *.feed.late.allowed=true
+  *.feed.late.frequency=hours(3)
+  *.feed.late.policy=exp-backoff
+</verbatim>
+
+
 ---++ Idempotency
 All the operations in Falcon are Idempotent. That is if you make same request to the falcon server / prism again you will get a SUCCESSFUL return if it was SUCCESSFUL in the first attempt. For example, you submit a new process / feed and get SUCCESSFUL message return. Now if you run the same command / api request on same entity you will again get a SUCCESSFUL message. Same is true for other operations like schedule, kill, suspend and resume.
 Idempotency also by takes care of the condition when request is sent through prism and fails on one or more servers. For example prism is configured to send request to 3 servers. First user sends a request to SUBMIT a process on all 3 of them, and receives a response SUCCESSFUL from all of them. Then due to some issue one of the servers goes down, and user send a request to schedule the submitted process. This time he will receive a response with PARTIAL status and a FAILURE message from the server that has gone down. If the users check he will find the process would have been started and running on the 2 SUCCESSFUL servers. Now the issue with server is figured out and it is brought up. Sending the SCHEDULE request again through prism will result in a SUCCESSFUL response from prism as well as other three servers, but this time PROCESS will be SCHEDULED only on the server which had failed earlier and other two will keep running as before. 

http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/src/conf/runtime.properties
----------------------------------------------------------------------
diff --git a/src/conf/runtime.properties b/src/conf/runtime.properties
index a40d369..58dee3d 100644
--- a/src/conf/runtime.properties
+++ b/src/conf/runtime.properties
@@ -26,8 +26,15 @@
 #prism should have the following properties
 prism.all.colos=local
 prism.falcon.local.endpoint=https://localhost:15443
-#falcon server should have the following properties
+
+# falcon server should have the following properties
 falcon.current.colo=local
 webservices.default.max.results.per.page=100
+
 # retry count - to fetch the status from the workflow engine
-workflow.status.retry.count=30
\ No newline at end of file
+workflow.status.retry.count=30
+
+# Default configs to handle replication for late arriving feeds.
+feed.late.allowed=true
+feed.late.frequency=hours(3)
+feed.late.policy=exp-backoff
\ No newline at end of file