You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Alan Gates <ga...@yahoo-inc.com> on 2008/05/27 18:47:34 UTC
Re: [Pig Wiki] Update of "PlanTestingHelper" by PiSong
Pi, this looks great. Pradeep, I think you might be able to use this
for the combiner testing you need to add.
Alan.
Apache Wiki wrote:
> Dear Wiki user,
>
> You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
>
> The following page has been changed by PiSong:
> http://wiki.apache.org/pig/PlanTestingHelper
>
> New page:
> = Plan Testing Helper =
>
> This is a small utility that I developed for testing my type checking logic. I think it might be useful for other people as well so I have refactored a bit to make it more generic.
>
> == Use cases ==
> Here are steps that I do for type checking:-
> * Construct a plan
> * Run type-checking logic against the plan
> * Construct the expected plan
> * Compare structures of the actual plan and the expected plan.
>
> Here are steps that one might do for query parser:-
> * Given a query string, construct the plan.
> * Construct the expected plan
> * Compare two plans
>
> Here for testing plan optimizer:-
> * Construct a plan
> * Run optimizer
> * Construct the expected plan
> * Compare structures of the actual plan and the expected plan.
>
> == What can be facilitated? ==
> So there are two common bits from above use cases:-
> 1. Construct the expected plan
> 1. Compare two plans
>
> == Construct a plan ==
>
> ==== What is Dot Language? ====
> Dot language is a text graph description language. There are three main object types: node, edge, and graph. All of them can have custom attributes.
>
> ==== Sample Dot graph ====
> {{{
> digraph plan1 {
>
> load [color="black"]
>
> load -> distinct -> split -> splitOut1 [style=dotted] ;
> split -> splitOut2 ;
> splitOut1 -> cross ;
> splitOut2 -> cross ;
> }
> }}}
>
> '''Note''': "digraph" dictates that this is a description of directed graph which is the domain we're interested in.
>
> '''Note''': "load [color="black"]" is attaching an attribute to the node. This is optional.
>
> By extending Dot a bit, we can encode our logical plan in the following format:-
>
> {{{
> digraph graph1 {
>
> load [key="114", type="LOLoad", schema="field1: int, field2: float"] ;
> distinct [key="115", type="LODistinct", schema="field1: int, field2: float"] ;
> split [key="116", type="LOSplit", schema="field1: int, field2: float"] ;
> splitout1 [key="117", type="LOForEach", schema="field1: int, field2: float"] ;
> splitout2 [key="117", type="LOForEach", schema="field1: int, field2: float"]
> cross [key="119", type="LOCross", schema="field1: int, field2: float, field3: chararray"] ;
>
> load -> distinct -> split -> splitOut1 ;
> split -> splitOut2 ;
> splitOut1 -> cross ;
> splitOut2 -> cross ;
> }
> }}}
> And this can be translated to a plan using a loader class (API will be provided)
>
> == Compare two plans ==
>
> I will provide API like this:-
>
> {{{
> /***
> * This abstract class is a base for plan comparer
> */
>
> public abstract class PlanStructuralComparer<E extends Operator,
> P extends OperatorPlan<E>> {
>
> /***
> * This method does structural comparison of two plans based on:-
> * - Graph connectivity
> *
> * The current implementation is based on simple key-based
> * vertex matching.
> *
> * @param plan1 the first plan
> * @param plan2 the second plan
> * @param messages where the error messages go
> * @return
> */
> public boolean structurallyEquals(P plan1, P plan2, StringBuilder messages) ;
>
>
> /***
> * Same as above in case just want to compare but
> * don't want to know the error messages
> * @param plan1
> * @param plan2
> * @return
> */
> public boolean structurallyEquals(P plan1, P plan2) ;
> }
> }}}
>
> A subtype which is interested in type information would look like this:-
>
> {{{
> /***
> * This class is used for LogicalPlan comparison
> */
> public class LogicalPlanComparer
> extends PlanStructuralComparer<LogicalOperator, LogicalPlan> {
>
> /***
> * This method does naive structural comparison of two plans.
> *
> * Things we compare :-
> * - Things compared in the super class
> * - Types of matching nodes
> * - Schema associated with each operator
> *
> * @param plan1
> * @param plan2
> * @param messages
> * @return
> */
> @Override
> public boolean structurallyEquals(LogicalPlan plan1,
> LogicalPlan plan2,
> StringBuilder messages) {
> // Stage 1: Compare connectivity
> if (!super.structurallyEquals(plan1, plan2, messages)) return false ;
>
> // Stage 2: Compare node types
> if (isMismatchNodeType(plan1, plan2, messages)) return false ;
>
> // Stage 3: Compare schemas
> if (isMismatchSchemas(plan1, plan2, messages)) return false ;
>
> // else
> return true ;
> }
> }}}
>
> == Dot Trick ==
> One can plot a graph written in Dot language by just doing like:-
> {{{
> dot -Tpng dot1.dot > dot1.png
> }}}
> Or alternatively,
> {{{
> dotty dot1.dot
> }}}
> NOTE: You need graphviz installed on your machine to do these things.
>
> Here is a sample graph generated from the given sample.
>
> http://people.apache.org/~pisong/dot1.png
>
> = Current Status & Issues =
> * Working code will be available in 1-2 days (Today = 26th May)
> * Doesn't work with inner plans yet. Inner plans may have to be constructed and compare separately.
>
> == Appendix ==
>
> The API:-
>
> OperatorPlanLoader - This class is an abstract base class for loading a plan from Dot
>
> {{{
> public abstract class OperatorPlanLoader<E extends Operator,
> P extends OperatorPlan<E>> {
>
> /***
> * This method is used for loading an operator plan encoded in Dot format
> * @param dotContent the dot content
> * @param clazz the plan type to be created
> * @return
> */
> public P load(String dotContent) {
>
> /***
> * This method has be overridden to instantiate the correct node type
> *
> * @param node
> * @param plan
> * @return
> */
> protected abstract E createOperator(Node node, P plan) ;
> }
> }}}
>
> Structures captured from Dot (Before being converted to plan):-
>
> {{{
> /***
> * This represents graph structure in DOT format
> */
> public class DotGraph {
>
> public String name;
> public List<Edge> edges = new ArrayList<Edge>() ;
> public List<Node> nodes = new ArrayList<Node>() ;
> public Map<String, String> attributes = new HashMap<String,String>() ;
>
>
> public DotGraph(String name) {
> this.name = name ;
> }
>
> }
> }}}
> {{{
> /***
> * This represents a node in DOT format
> */
> public class Node {
>
> public String name ;
> public Map<String, String> attributes = new HashMap<String,String>() ;
>
> }
> }}}
> {{{
> /**
> * This represents an edge in DOT format.
> * An edge in DOT can have attributes but we're not interested
> */
> public class Edge {
> public String fromNode ;
> public String toNode ;
> }
> }}}
>