You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@falcon.apache.org by ve...@apache.org on 2013/06/11 18:59:54 UTC

svn commit: r1491876 [2/2] - in /incubator/falcon: site/ site/docs/ site/images/ site/slides/ site/wiki/ trunk/ trunk/src/site/resources/images/ trunk/src/site/resources/slides/ trunk/src/site/twiki/ trunk/src/site/twiki/docs/

Modified: incubator/falcon/trunk/src/site/resources/slides/falcon-user-guide.html
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/src/site/resources/slides/falcon-user-guide.html?rev=1491876&r1=1491875&r2=1491876&view=diff
==============================================================================
--- incubator/falcon/trunk/src/site/resources/slides/falcon-user-guide.html (original)
+++ incubator/falcon/trunk/src/site/resources/slides/falcon-user-guide.html Tue Jun 11 16:59:54 2013
@@ -32,10 +32,11 @@
 <!-- Begin slides. Just make elements with a class of slide. -->
 
 <section class="slide" id="intro">
-    <h1>Apache Falcon - User Guide</h1>
+    <h2>Apache Falcon - User Guide</h2>
+    <h3>Coming soon .... </h3>
 </section>
 
-<section class="slide" id="build">
+<!--<section class="slide" id="build">
     <h2>Building Apache Falcon</h2>
     <ol>
         <li>
@@ -44,7 +45,7 @@
         </li>
         <li>
             <p>Compile the project</p>
-            <pre><code>mvn -DskipTests clean package</code></pre>
+            <pre><code>mvn -DskipTests clean package verify</code></pre>
         </li>
         <li>
             <p>Optionally run the tests</p>
@@ -93,7 +94,7 @@
             <p>TBD: Schedule a sample process</p>
         </li>
     </ul>
-</section>
+</section>-->
 
 <!-- End slides. -->
 

Modified: incubator/falcon/trunk/src/site/twiki/docs/FalconArchitecture.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/src/site/twiki/docs/FalconArchitecture.twiki?rev=1491876&r1=1491875&r2=1491876&view=diff
==============================================================================
--- incubator/falcon/trunk/src/site/twiki/docs/FalconArchitecture.twiki (original)
+++ incubator/falcon/trunk/src/site/twiki/docs/FalconArchitecture.twiki Tue Jun 11 16:59:54 2013
@@ -16,19 +16,19 @@
 ---++ Architecture
 ---+++ Introduction
 Falcon is a feed and process management platform over hadoop. Falcon essentially transforms user's feed
-and process configurations into repeated actions through a standard workflow engine. Falcon by itself
-doesn't do any heavy lifting. All the functions and workflow state management requirements are delegated
-to the workflow scheduler. The only thing that Falcon maintains is the dependencies and relationship between
-these entities. This is adequate to provide integrated and seamless experience to the developers using
+and process configurations into repeated actions through a standard workflow engine (Apache Oozie). Falcon
+by itself doesn't do any heavy lifting. All the functions and workflow state management requirements are
+delegated to the workflow scheduler. The only thing that Falcon maintains is the dependencies and relationship
+between these entities. This is adequate to provide integrated and seamless experience to the developers using
 the falcon platform.
 
 ---+++ Falcon Architecture - Overview
 <img src="../images/Architecture.png" height="400" width="600" />
 
 ---+++ Scheduler
-Falcon system has picked Oozie as the default scheduler. However the system is open for integration with
+Falcon system has picked Apache Oozie as the default scheduler. However the system is open for integration with
 other schedulers. Lot of the data processing in hadoop requires scheduling to be based on both data availability
-as well as time. Oozie currently supports these capabilities off the shelf and hence the choice.
+as well as time. Apache Oozie currently supports these capabilities off the shelf and hence the choice.
 
 ---+++ Control flow
 Though the actual responsibility of the workflow is with the scheduler (Oozie), Falcon remains in the
@@ -85,89 +85,6 @@ individual operations performed are reco
 the overall user action. In some cases, it is not possible to undo the action. In such cases, Falcon attempts
 to keep the system in an consistent state.
 
----++ Entity Management actions
-
----+++ Submit:
-Entity submit action allows a new cluster/feed/process to be setup within Falcon. Submitted entity is not
-scheduled, meaning it would simply be in the configuration store within Falcon. Besides validating against
-the schema for the corresponding entity being added, the Falcon system would also perform inter-field
-validations within the configuration file and validations across dependent entities.
-
----+++ List:
-List all the entities within the falcon config store for the entity type being requested. This will include
-both scheduled and submitted entity configurations.
-
----+++ Dependency:
-Returns the dependencies of the requested entity. Dependency list include both forward and backward
-dependencies (depends on & is dependent on). For ex, a feed would show process that are dependent on the
-feed and the clusters that it depends on.'
-
----+++ Schedule:
-Feeds or Processes that are already submitted and present in the config store can be scheduled. Upon schedule,
-Falcon system wraps the required repeatable action as a bundle of oozie coordinators and executes them on the
-Oozie scheduler. (It is possible to extend Falcon to use an alternate workflow engine other than Oozie).
-Falcon overrides the workflow instance's external id in Oozie to reflect the process/feed and the nominal
-time. This external Id can then be used for instance management functions.
-
----+++ Suspend:
-This action is applicable only on scheduled entity. This triggers suspend on the oozie bundle that was
-scheduled earlier through the schedule function. No further instances are executed on a suspended process/feed.
-
----+++ Resume:
-Puts a suspended process/feed back to active, which in turn resumes applicable oozie bundle.
-
----+++ Status:
-Gets the current status of the entity.
-
----+++ Definition:
-Gets the current entity definition as stored in the configuration store. Please note that user documentations
-in the entity will not be retained.
-
----+++ Delete:
-Delete operation on the entity removes any scheduled activity on the workflow engine, besides removing the
-entity from the falcon configuration store. Delete operation on an entity would only succeed if there are
-no dependent entities on the deleted entity.
-
----+++ Update:
-Update operation allows an already submitted/scheduled entity to be updated. Cluster update is currently
-not allowed. Feed update can cause cascading update to all the processes already scheduled. The following
-set of actions are performed in Oozie to realize an update.
-
-   * Suspend the previously scheduled Oozie coordinator. This is prevent any new action from being triggered.
-   * Update the coordinator to set the end time to "now"
-   * Resume the suspended coordiantors
-   * Schedule as per the new process/feed definition with the start time as "now"
-
----++ Instance Management actions
-
-
-Instance Manager gives user the option to control individual instances of the process based on their instance start time (start time of that instance). Start time needs to be given in standard TZ format. Example:   01 Jan 2012 01:00  => 2012-01-01T01:00Z
-
-All the instance management operations (except running) allow single instance or list of instance within a Date range to be acted on. Make sure the dates are valid. i.e are within the start and  end time of process itself. 
-
-For every query in instance management the process name is a compulsory parameter. 
-
-Parameters -start and -end are used to mention the date range within which you want the instance to be operated upon. 
-
--start:   using only  "-start" without  "-end"  will conduct the desired operation only on single instance given by date along with start.
-
--end:  "-end"  can only be used along with "-start" . It corresponds to the end date till which instance need to operated upon. 
-
-   * 1. *status*: -status option via CLI can be used to get the status of a single or multiple instances.  If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state.Along with the status of the instance log location is also returned.
-
-
-   * 2.	*running*: -running returns all the running instance of the process. It does not take any start or end dates but simply return all the instances in state RUNNING at that given time. 
-
-   * 3.	*rerun*: -rerun is the option that you will use most often from instance management. As the name suggest this option is used to rerun a particular instance or instances of the process. The rerun option reruns all parent workflow for the instance, which in turn rerun all the sub-workflows for it. This option is valid for any instance in terminal state, i.e. KILLED, SUCCEEDED, FAILED. User can also set properties in the request, which will give options what types of actions should be rerun like, only failed, run all etc. These properties are dependent on the workflow engine being used along with falcon.
-   
-   * 4. *suspend*: -suspend is used to suspend a instance or instances  for the given process. This option pauses the parent workflow at the state, which it was in at the time of execution of this command. This command is similar to SUSPEND process command in functionality only difference being, SUSPEND process suspends all the instance whereas suspend instance suspend only that instance or instances in the range. 
-
-   * 5.	*resume*: -resume option is used to resume any instance that  is in suspended state.  (Note: due to a bug in oozie �resume option in some cases may not actually resume the suspended instance/ instances)
-   * 6. *kill*: -kill option can be used to kill an instance or multiple instances 
-
-
-In all the cases where your request is syntactically correct but logically not, the instance / instances are returned with the same status as earlier. Example:  trying to resume a KILLED  / SUCCEEDED instance will return the instance with KILLED / SUCCEEDED, without actually performing any operation. This is so because only an instance in SUSPENDED state can be resumed. Same thing is valid for rerun a SUSPENDED or RUNNING options etc. 
-
 ---++ Retention
 In coherence with it's feed lifecycle management philosophy, Falcon allows the user to retain  data in the system
 for a specific period of time for a scheduled feed. The user can specify the retention period in the respective 

Modified: incubator/falcon/trunk/src/site/twiki/docs/FalconCLI.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/src/site/twiki/docs/FalconCLI.twiki?rev=1491876&r1=1491875&r2=1491876&view=diff
==============================================================================
--- incubator/falcon/trunk/src/site/twiki/docs/FalconCLI.twiki (original)
+++ incubator/falcon/trunk/src/site/twiki/docs/FalconCLI.twiki Tue Jun 11 16:59:54 2013
@@ -6,82 +6,163 @@ FalconCLI is a interface between user an
 
 ---+++Submit
 
-Submit option is used to set up entity definition.
+Entity submit action allows a new cluster/feed/process to be setup within Falcon. Submitted entity is not
+scheduled, meaning it would simply be in the configuration store within Falcon. Besides validating against
+the schema for the corresponding entity being added, the Falcon system would also perform inter-field
+validations within the configuration file and validations across dependent entities.
 
+<verbatim>
 Example: 
 $FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml
+</verbatim>
 
 Note: The url option in the above and all subsequent commands is optional. If not mentioned it will be picked from client.properties file. If the option is not provided and also not set in client.properties, Falcon CLI will fail.
 
 ---+++Schedule
 
-Once submitted, an entity can be scheduled using schedule option. Process and feed can only be scheduled.
+Feeds or Processes that are already submitted and present in the config store can be scheduled. Upon schedule,
+Falcon system wraps the required repeatable action as a bundle of oozie coordinators and executes them on the
+Oozie scheduler. (It is possible to extend Falcon to use an alternate workflow engine other than Oozie).
+Falcon overrides the workflow instance's external id in Oozie to reflect the process/feed and the nominal
+time. This external Id can then be used for instance management functions.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [process|feed] -name <<name>> -schedule
 
 Example:
 $FALCON_HOME/bin/falcon entity  -type process -name sampleProcess -schedule
+</verbatim>
 
 ---+++Suspend
 
-Suspend on an entity results in suspension of the oozie bundle that was scheduled earlier through the schedule function. No further instances are executed on a suspended entity. Only schedulable entities(process/feed) can be suspended.
+This action is applicable only on scheduled entity. This triggers suspend on the oozie bundle that was
+scheduled earlier through the schedule function. No further instances are executed on a suspended process/feed.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -suspend
+</verbatim>
 
 ---+++Resume
 
 Puts a suspended process/feed back to active, which in turn resumes applicable oozie bundle.
 
+<verbatim>
 Usage:
  $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -resume
+</verbatim>
 
 ---+++Delete
 
-Delete removes the submitted entity definition for the specified entity and put it into the archive.
+Delete operation on the entity removes any scheduled activity on the workflow engine, besides removing the
+entity from the falcon configuration store. Delete operation on an entity would only succeed if there are
+no dependent entities on the deleted entity.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [cluster|feed|process] -name <<name>> -delete
+</verbatim>
 
 ---+++List
 
-Entities of a particular type can be listed with list sub-command.
+List all the entities within the falcon config store for the entity type being requested. This will include
+both scheduled and submitted entity configurations.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -list
+</verbatim>
 
 ---+++Update
 
 Update operation allows an already submitted/scheduled entity to be updated. Cluster update is currently
-not allowed.
+not allowed. Feed update can cause cascading update to all the processes already scheduled. The following
+set of actions are performed in Oozie to realize an update.
 
+   * Suspend the previously scheduled Oozie coordinator. This is prevent any new action from being triggered.
+   * Update the coordinator to set the end time to "now"
+   * Resume the suspended coordiantors
+   * Schedule as per the new process/feed definition with the start time as "now"
+
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -update
+</verbatim>
 
 ---+++Status
 
 Status returns the current status of the entity.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -status
+</verbatim>
 
 ---+++Dependency
 
-With the use of dependency option, we can list all the entities on which the specified entity is dependent. For example for a feed, dependency return the cluster name and for process it returns all the input feeds, output feeds and cluster names.
+Returns the dependencies of the requested entity. Dependency list include both forward and backward
+dependencies (depends on & is dependent on). For ex, a feed would show process that are dependent on the
+feed and the clusters that it depends on.'
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -dependency
+</verbatim>
 
 ---+++Definition
 
-Definition option returns the entity definition submitted earlier during submit step.
+Gets the current entity definition as stored in the configuration store. Please note that user documentations
+in the entity will not be retained.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -definition
+</verbatim>
 
 ---++Instance Management Options
 
+Instance Manager gives user the option to control individual instances of the process based on their instance start time (start time of that instance). Start time needs to be given in standard TZ format. Example:   01 Jan 2012 01:00  => 2012-01-01T01:00Z
+
+All the instance management operations (except running) allow single instance or list of instance within a Date range to be acted on. Make sure the dates are valid. i.e are within the start and  end time of process itself.
+
+For every query in instance management the process name is a compulsory parameter.
+
+Parameters -start and -end are used to mention the date range within which you want the instance to be operated upon.
+
+-start:   using only  "-start" without  "-end"  will conduct the desired operation only on single instance given by date along with start.
+
+-end:  "-end"  can only be used along with "-start" . It corresponds to the end date till which instance need to operated upon.
+
+   * 1. *status*: -status option via CLI can be used to get the status of a single or multiple instances.  If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state.Along with the status of the instance log location is also returned.
+
+
+   * 2.	*running*: -running returns all the running instance of the process. It does not take any start or end dates but simply return all the instances in state RUNNING at that given time.
+
+   * 3.	*rerun*: -rerun is the option that you will use most often from instance management. As the name suggest this option is used to rerun a particular instance or instances of the process. The rerun option reruns all parent workflow for the instance, which in turn rerun all the sub-workflows for it. This option is valid for any instance in terminal state, i.e. KILLED, SUCCEEDED, FAILED. User can also set properties in the request, which will give options what types of actions should be rerun like, only failed, run all etc. These properties are dependent on the workflow engine being used along with falcon.
+
+   * 4. *suspend*: -suspend is used to suspend a instance or instances  for the given process. This option pauses the parent workflow at the state, which it was in at the time of execution of this command. This command is similar to SUSPEND process command in functionality only difference being, SUSPEND process suspends all the instance whereas suspend instance suspend only that instance or instances in the range.
+
+   * 5.	*resume*: -resume option is used to resume any instance that  is in suspended state.  (Note: due to a bug in oozie �resume option in some cases may not actually resume the suspended instance/ instances)
+   * 6. *kill*: -kill option can be used to kill an instance or multiple instances
+
+
+In all the cases where your request is syntactically correct but logically not, the instance / instances are returned with the same status as earlier. Example:  trying to resume a KILLED  / SUCCEEDED instance will return the instance with KILLED / SUCCEEDED, without actually performing any operation. This is so because only an instance in SUSPENDED state can be resumed. Same thing is valid for rerun a SUSPENDED or RUNNING options etc.
+
+---+++Status
+
+Status option via CLI can be used to get the status of a single or multiple instances.  If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state. Along with the status of the instance time is also returned. Log location gives the oozie workflow url
+If the instance is in WAITING state, missing dependencies are listed
+
+Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:
+
+{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"}, {"instance":"2010-01-02T11:05Z","status":"WAITING"}]
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -status -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
+
 ---+++Kill
 
 Kill sub-command is used to kill all the instances of the specified process whose nominal time is between the given start time and end time.
@@ -94,73 +175,79 @@ Example:   01 Jan 2012 01:00  => 2012-01
 
 3. Process name is compulsory parameter for each instance management command.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -kill -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
 
 ---+++Suspend
 
 Suspend is used to suspend a instance or instances  for the given process. This option pauses the parent workflow at the state, which it was in at the time of execution of this command.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -suspend -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
 
 ---+++Continue
 
 Continue option is used to continue the failed workflow instance. This option is valid only for process instances in terminal state, i.e. SUCCEDDED, KILLED or FAILED.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -re-run -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
 
 ---+++Rerun
 
 Rerun option is used to rerun instances of a given process. This option is valid only for process instances in terminal state, i.e. SUCCEDDED, KILLED or FAILED. Optionally, you can specify the properties to override.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -re-run -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" [-file <<properties file>>]
+</verbatim>
 
 ---+++Resume
 
 Resume option is used to resume any instance that  is in suspended state.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -resume -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
-
----+++Status
-
-Status option via CLI can be used to get the status of a single or multiple instances.  If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state. Along with the status of the instance time is also returned. Log location gives the oozie workflow url
-If the instance is in WAITING state, missing dependencies are listed
-
-Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:
-
-{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"}, {"instance":"2010-01-02T11:05Z","status":"WAITING"}] 
-
-Usage:
-$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -status -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
 
 ---+++Running
 
 Running option provides all the running instances of the mentioned process.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -running
+</verbatim>
 
 ---+++Logs
 
 Get logs for instance actions
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -logs -start "yyyy-MM-dd'T'HH:mm'Z'" [-end "yyyy-MM-dd'T'HH:mm'Z'"] [-runid <<runid>>]
-
+</verbatim>
 
 ---++Admin Options
 
 ---+++Help
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon admin -version
+</verbatim>
 
 ---+++Version
 
 Version returns the current verion of Falcon installed.
+
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon admin -help
+</verbatim>
\ No newline at end of file

Modified: incubator/falcon/trunk/src/site/twiki/index.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/src/site/twiki/index.twiki?rev=1491876&r1=1491875&r2=1491876&view=diff
==============================================================================
--- incubator/falcon/trunk/src/site/twiki/index.twiki (original)
+++ incubator/falcon/trunk/src/site/twiki/index.twiki Tue Jun 11 16:59:54 2013
@@ -15,7 +15,7 @@ configurations are expressed in such a w
 explicitly described. This information about inter-dependencies between various entities allows Falcon
 to orchestrate and manage various data management functions.
 
-Falcon was successfully accepted as an incubation project in April 2013 and is now in apache incubation.
+Falcon was accepted as an incubation project in April 2013 and is now in apache incubation.
 
 
 <div id="components" class="carousel slide">