You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Alan Gates <ga...@yahoo-inc.com> on 2008/05/27 18:47:34 UTC
Re: [Pig Wiki] Update of "PlanTestingHelper" by PiSong

Pi, this looks great.  Pradeep, I think you might be able to use this 
for the combiner testing you need to add.

Alan.

Apache Wiki wrote:
> Dear Wiki user,
>
> You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
>
> The following page has been changed by PiSong:
> http://wiki.apache.org/pig/PlanTestingHelper
>
> New page:
> = Plan Testing Helper =
>
> This is a small utility that I developed for testing my type checking logic. I think it might be useful for other people as well so I have refactored a bit to make it more generic.
>
> == Use cases ==
> Here are steps that I do for type checking:-
>  * Construct a plan
>  * Run type-checking logic against the plan
>  * Construct the expected plan
>  * Compare structures of the actual plan and the expected plan.
>
> Here are steps that one might do for query parser:-
>  * Given a query string, construct the plan.
>  * Construct the expected plan
>  * Compare two plans
>
> Here for testing plan optimizer:-
>  * Construct a plan
>  * Run optimizer
>  * Construct the expected plan
>  * Compare structures of the actual plan and the expected plan.
>
> == What can be facilitated? ==
> So there are two common bits from above use cases:-
>  1. Construct the expected plan
>  1. Compare two plans
>
> == Construct a plan ==
>
> ==== What is Dot Language? ====
> Dot language is a text graph description language. There are three main object types: node, edge, and graph. All of them can have custom attributes.
>
> ==== Sample Dot graph ====
> {{{
> digraph plan1 {
>     
>     load [color="black"]
>
>     load -> distinct -> split -> splitOut1 [style=dotted] ;
>     split -> splitOut2 ;
>     splitOut1 -> cross ;
>     splitOut2 -> cross ;
> }
> }}}
>
> '''Note''': "digraph" dictates that this is a description of directed graph which is the domain we're interested in.
>
> '''Note''': "load [color="black"]" is attaching an attribute to the node. This is optional.
>
> By extending Dot a bit, we can encode our logical plan in the following format:-
>
> {{{
> digraph graph1 {
>
>     load 	[key="114", type="LOLoad", schema="field1: int, field2: float"] ;
>     distinct 	[key="115", type="LODistinct", schema="field1: int, field2: float"] ;
>     split 	[key="116", type="LOSplit", schema="field1: int, field2: float"] ;
>     splitout1 	[key="117", type="LOForEach", schema="field1: int, field2: float"] ;
>     splitout2 	[key="117", type="LOForEach", schema="field1: int, field2: float"] 
>     cross 	[key="119", type="LOCross", schema="field1: int, field2: float, field3: chararray"] ;
>
>     load -> distinct -> split -> splitOut1 ;
>     split -> splitOut2 ;
>     splitOut1 -> cross ;
>     splitOut2 -> cross ;
> }
> }}}
> And this can be translated to a plan using a loader class (API will be provided)
>
> == Compare two plans ==
>
> I will provide API like this:-
>
> {{{
> /***
>  * This abstract class is a base for plan comparer
>  */
>
> public abstract class PlanStructuralComparer<E extends Operator,
>                                              P extends OperatorPlan<E>> {
>
>     /***
>      * This method does structural comparison of two plans based on:-
>      *  - Graph connectivity
>      *
>      * The current implementation is based on simple key-based
>      * vertex matching.
>      *
>      * @param plan1 the first plan
>      * @param plan2 the second plan
>      * @param messages where the error messages go
>      * @return
>      */
>     public boolean structurallyEquals(P plan1, P plan2, StringBuilder messages)  ;
>
>
>     /***
>      * Same as above in case just want to compare but
>      * don't want to know the error messages
>      * @param plan1
>      * @param plan2
>      * @return
>      */
>     public boolean structurallyEquals(P plan1, P plan2) ;
> }
> }}}
>
> A subtype which is interested in type information would look like this:-
>
> {{{
> /***
>  * This class is used for LogicalPlan comparison
>  */
> public class LogicalPlanComparer
>                 extends PlanStructuralComparer<LogicalOperator, LogicalPlan> {
>
>     /***
>      * This method does naive structural comparison of two plans.
>      *
>      * Things we compare :-
>      *  - Things compared in the super class
>      *  - Types of matching nodes
>      *  - Schema associated with each operator
>      *
>      * @param plan1
>      * @param plan2
>      * @param messages
>      * @return
>      */
>     @Override
>     public boolean structurallyEquals(LogicalPlan plan1,
>                                       LogicalPlan plan2,
>                                       StringBuilder messages) {
>         // Stage 1: Compare connectivity
>         if (!super.structurallyEquals(plan1, plan2, messages)) return false ;
>
>         // Stage 2: Compare node types
>         if (isMismatchNodeType(plan1, plan2, messages)) return false ;
>
>         // Stage 3: Compare schemas
> 	if (isMismatchSchemas(plan1, plan2, messages)) return false ;
>    
>         // else
>         return true ;
>     }
> }}}
>
> == Dot Trick ==
> One can plot a graph written in Dot language by just doing like:-
> {{{
> dot -Tpng dot1.dot > dot1.png
> }}}
> Or alternatively,
> {{{
> dotty dot1.dot
> }}}
> NOTE: You need graphviz installed on your machine to do these things.
>
> Here is a sample graph generated from the given sample.
>
> http://people.apache.org/~pisong/dot1.png
>
> = Current Status & Issues =
>  * Working code  will be available in 1-2 days (Today = 26th May)
>  * Doesn't work with inner plans yet. Inner plans may have to be constructed and compare separately.
>
> == Appendix ==
>
> The API:-
>
> OperatorPlanLoader - This class is an abstract base class for loading a plan from Dot  
>
> {{{
> public abstract class OperatorPlanLoader<E extends Operator,
>                                          P extends OperatorPlan<E>> {
>
>     /***
>      * This method is used for loading an operator plan encoded in Dot format
>      * @param dotContent the dot content
>      * @param clazz the plan type to be created
>      * @return
>      */
>     public P load(String dotContent) {
>
>     /***
>      * This method has be overridden to instantiate the correct node type
>      *
>      * @param node
>      * @param plan
>      * @return
>      */
>     protected abstract E createOperator(Node node, P plan) ;
> }
> }}}
>
> Structures captured from Dot (Before being converted to plan):-
>
> {{{
> /***
>  * This represents graph structure in DOT format
>  */
> public class DotGraph {
>
>     public String name;
>     public List<Edge> edges = new ArrayList<Edge>() ;
>     public List<Node> nodes = new ArrayList<Node>() ;
>     public Map<String, String> attributes = new HashMap<String,String>() ;
>
>
>     public DotGraph(String name) {
>         this.name = name ;
>     }
>
> }
> }}}
> {{{
> /***
>  * This represents a node in DOT format
>  */
> public class Node {
>
>     public String name ;
>     public Map<String, String> attributes = new HashMap<String,String>() ;
>
> }
> }}}
> {{{
> /**
>  * This represents an edge in DOT format.
>  * An edge in DOT can have attributes but we're not interested
>  */
> public class Edge {
>     public String fromNode ;
>     public String toNode ;
> }
> }}}
>