You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2019/12/05 00:37:04 UTC

[GitHub] [incubator-tvm] DKXXXL opened a new issue #4468: [RFC] Data-flow Analysis Functionality on TVM IR

DKXXXL opened a new issue #4468: [RFC] Data-flow Analysis Functionality on TVM IR
URL: https://github.com/apache/incubator-tvm/issues/4468
 
 
   # Problem
   
   When developing program passes on TVM IR (the one once was Halide IR), it is normal to ask for all sorts of information requiring program analysis, for example, live variable analysis for dead code elimination. This requirement becomes urgent when TVM has to directly issue intrinsic and the subsequent processing stage (for example LLVM) cannot analyze these outputs. 
   
   # Consideration
   
   *  Program Analysis Functionality must be consistent with the immutable style of TVM IR/AST
   *  A Simple Framework is required to eliminate as much boilerplate as possible when the programmer tries to come up with other types of data-flow analysis
   *  The interface must be easy to adapt to
   
   We first start with a concrete example -- Live Variable Analysis.
   
   ###  Live Variable Analysis
   
   I will take Live Variable Analysis as an example. I will define a functor, **inputting** *an AST* and *the set of live variable* after the AST is executed, **outputting** *the set of live variable* before the AST is executed.
   
   ```cpp
   class LiveAnalysisFunctor : public ExprFunctor<set(const Expr&, const set&)>,
                                public StmtFunctor<set(const Stmt&, const set&)>
   ```
   
   This functor can be easily defined inductively on the *AST*, demonstrated in pseudo-code
   ```haskell
   -- the function below taking AST and post-live-variables
   LiveAnalysisFunctor (Block{stmtX,stmtY}) postAlive = 
        let intermediateAlive = LiveAnalysisFunctor stmtY postAlive
        in LiveAnalysisFunctor stmtX intermediateAlive
   ```
   In other words, we propagate up the live variable from the end to the beginning, **according to the execution order of TVM IR**
   
   When encountering *if*-statement, we do similarly
   ```haskell
   LiveAnalysisFunctor (IfStmt{cond,branchThen, branchElse}) postAlive = 
        let beforeThen = LiveAnalysisFunctor branchThen postAlive
             beforeElse   = LiveAnalysisFunctor branchElse postAlive
             afterConditional = Union(beforeThen, beforeElse)
        in LiveAnalysisFunctor cond afterConditional
   ```
   The computation of *for*-statement is the only that requiring fixpoint computation. 
   
   Also, we know at the endpoint, the live variables will be empty. 
   
   Several (only relevant to live variable analysis) trivial fact:
   * two variables are the same when they are declared at the same place in the program (since we have scopes in the AST, we also have variables with the same name)
   * We have to distinguish, for each variable/FunctionRef, whether it is def or use
   
   #### Memorization
   
   You can see that in the current situation, to have the live variable information at arbitrary point, we need to calculate from very end to the point (and if that point is inside a for-loop, we need to make sure the fixpoint is reached). Thus it is suggested we only compute one time and saving the result. Fortunately, the above functor is pure, thus we can try to cache the above functor into a standard dictionary/map data structure, and thus *an AST* and *the set of live variable* as **key**, and *the set of live variable* before the AST as **value**. 
   
   
   The memorization strategy should be able to apply to any *pure* IRFunctor:
   ```cpp
   template <functor>
   class MemorizedVersion<functor> : functor
   ```
   
   During the computation of the liveness of the very beginning of the program, the whole program will be visited by the Functor and thus will be cached into the dictionary.
   
   However, we may only need the information when fixpoint is reached, which means we can keep forgetting the old value every time a new pair of *AST* and *PostAlive* is computed while the *AST* has already once inserted into the dictionary with other *PostAlive'*. 
   
   Ultimately, we want to transform the above dictionary into another dictionary taking *AST* as **key**, and the *live variable before executing AST* and *live variable after executing AST* as **value**. Of course, this dictionary has a clear specification that "only when the end of the whole program has empty set as live variables, each point in the AST has the live variable set as indicated in the dictionary".
   
   Also note that, it is possible to query everywhere in the program about the Liveness Infomation, as long as the Functor is properly defined everywhere.
   
   # Proposal
   
   We generalize the above functor into forward/backward data-flow analysis 'framework' to remove boilerplate code as much as possible
   
   ```cpp
   // forward, functional, DFA
   template <class lattice,        // the type lattice, also the prestate/poststate type
         class ltProof,      // the proof that lattice is a lattice, a subtype of IsLattice<lattice>
         const ltProof& ltOp> // lattice Operations, bind the operation at compile time
   class ffdfa : public ExprFunctor<lattice(const Expr&, const lattice&)>,
                 public StmtFunctor<lattice(const Stmt&, const lattice&)> {
     static_assert(std::is_base_of<IsLattice<lattice>, ltProof>::value, "ltProof must be a class derived from IsLattice<>");
   ...
   }
   
   template<class T>
   class IsLattice { // classical lattice definition, T is lattice under the following operation
     public:
       // a least upperbound of two
       T join(const T& a, const T& b) = 0;
       // a greatest lower bound of two
       T meet(const T& a, const T& b) = 0;
       // highest
       const T& top() = 0;
       // whole set
       const T& bottom() = 0;
       bool equal(const T& a, const T& b) = 0;
   };
   ```
   
   I will indicate the usage of `IsLattice` in the comment section. This part (the way defining lattice) is actually quite fluid and more about the programming style.
   
   There will also be another class `bfdfa`.
   
   Memorization part:
   ```cpp
   
   template <typename R, // Return type of the functor that want to be memorized
               typename KType,  // key type of the dictionary
               MemorizingPolicy m, // currently only have policy 'override'
               typename ParentFunctor, // the functor that want to be memorized
               typename ...Args, 
               typename ...InitArgs> // initialization argument to used when initializing the dictionary
   class RecordingExprFunctor<KType(const Expr& n, Args... args),  // expect  function that map the input to the KeyType
                             ParentFunctor,
                             m> : public ParentFunctor { ...
   using ThisFunctionType = R(const Expr& n, Args... args);
   static_assert(std::is_base_of<ExprFunctor<ThisFunctionType>, ParentFunctor>::value, "ParentFunctor must be of ExprFunctor Type");
   
   
   // we propose to have a unique_ptr to the memorizing dictionary (named as "A", that record the evaluation of the functor), so that reuse of MemorizedVersion will not act on the already filled dictionary (after each memorization, std::move it out).
   
   // we will have another dictionary (called "B" here), that map KeyType to the input type
   
   
   R VisitExpr(const Expr& n, Args... args) override;
   
   // Everytime VisitExpr is called with input ‘x’, it will first check if the current dict B has the KeyType and if it has and the key is also mapped to the "same" input 'x', then we can just query the dict A
   // Otherwise we need to evaluate and according to the above MemorizingPolicy 'override', we will insert(replace) the new(already existent) the evaluated value into dict A
   
   // Designed in this twisted way so that if KeyType = AST, InputType=<AST x Lattice>, we can capture the final value of the fixpoint computation.
   
   // Of course here if KeyType = InputType, it will become trivial memorization and don't need dict B
   }
   
   // Similar for StmtFunctor
   
   // Then IRFunctor
   template <typename R, 
               typename KType, 
               MemorizingPolicy m,
               typename ParentFunctor,
               typename ...Args, 
               typename ...InitArgs>
   class RecordingIRFunctor<KType(const NodeRef& n, Args... args), 
                             ParentFunctor,
                             m> : RecordingExprFunctor<KType(const Expr& n, Args... args), ParentFunctor, m>,
                                  RecordingStmtFunctor<KType(const Stmt& n, Args... args), ParentFunctor, m> {...
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services