You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2019/12/31 05:50:57 UTC

[GitHub] [drill] paul-rogers opened a new pull request #1944: DRILL-7503: Refactor the project operator

paul-rogers opened a new pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944
 
 
   Breaks the big "setup" function into its own class, and
   separates out physical vector setup from logical projection
   planning. No functional change; just rearranging existing
   code.
   
   Testing: reran all unit tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362494309
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
 
 Review comment:
   please remove the field and enum as suggested previously.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362530135
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
+
+  public static String printVV(ValueVector vv) {
+    String str = "null";
+    if (vv != null) {
+      str = vv.getField().getName() + " " + vv.getField().getType();
     }
-
-    void addFixedWidthField(ValueVector vv) {
-        assert isFixedWidth(vv);
-        fixedWidthColumnCount++;
-        int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
-        totalFixedWidthColumnWidth += fixedFieldWidth;
-        logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+    return str;
 
 Review comment:
   ```suggestion
       return vv == null ? "null" : vv.getField().toString();
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362705846
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
+
+  public static String printVV(ValueVector vv) {
+    String str = "null";
+    if (vv != null) {
+      str = vv.getField().getName() + " " + vv.getField().getType();
     }
-
-    void addFixedWidthField(ValueVector vv) {
-        assert isFixedWidth(vv);
-        fixedWidthColumnCount++;
-        int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
-        totalFixedWidthColumnWidth += fixedFieldWidth;
-        logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+    return str;
+  }
+
+  void addComplexField(ValueVector vv) {
+    //Complex types are not yet supported. Just use a guess for the size
+    assert vv == null || isComplex(vv.getField().getType());
+    complexColumnsCount++;
+    // just a guess
+    totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
+    logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+  }
+
+  void addFixedWidthField(ValueVector vv) {
+    assert isFixedWidth(vv);
+    fixedWidthColumnCount++;
+    int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
+    totalFixedWidthColumnWidth += fixedFieldWidth;
+    logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+  }
+
+  public void init(RecordBatch incomingBatch, ProjectRecordBatch outgoingBatch) {
+    setIncomingBatch(incomingBatch);
+    setOutgoingBatch(outgoingBatch);
+    reset();
+
+    RecordBatchStats.printConfiguredBatchSize(outgoingBatch.getRecordBatchStatsContext(),
+      getOutputBatchSize());
+  }
+
+  private void reset() {
+    rowWidth = 0;
+    totalFixedWidthColumnWidth = 0;
+    totalComplexColumnWidth = 0;
+
+    fixedWidthColumnCount = 0;
+    complexColumnsCount = 0;
+  }
+
+  @Override
+  public void update() {
+    long updateStartTime = System.currentTimeMillis();
+    RecordBatchSizer batchSizer = new RecordBatchSizer(incomingBatch);
+    long batchSizerEndTime = System.currentTimeMillis();
+
+    setRecordBatchSizer(batchSizer);
+    rowWidth = 0;
+    int totalVariableColumnWidth = 0;
+    for (String outputColumnName : outputColumnSizes.keySet()) {
+      ColumnWidthInfo columnWidthInfo = outputColumnSizes.get(outputColumnName);
+      int width = -1;
+      if (columnWidthInfo.isFixedWidth()) {
+        // fixed width columns are accumulated in totalFixedWidthColumnWidth
+        ShouldNotReachHere();
+      } else {
+        //Walk the tree of OutputWidthExpressions to get a FixedLenExpr
+        //As the tree is walked, the RecordBatchSizer and function annotations
+        //are looked-up to come up with the final FixedLenExpr
+        OutputWidthExpression savedWidthExpr = columnWidthInfo.getOutputExpression();
+        OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+        OutputWidthExpression reducedExpr = savedWidthExpr.accept(new OutputWidthVisitor(), state);
+        width = ((FixedLenExpr)reducedExpr).getDataWidth();
+        Preconditions.checkState(width >= 0);
+        int metadataWidth = getMetadataWidth(columnWidthInfo.outputVV);
+        logger.trace("update(): fieldName {} width: {} metadataWidth: {}",
+                columnWidthInfo.outputVV.getField().getName(), width, metadataWidth);
+        width += metadataWidth;
+      }
+      totalVariableColumnWidth += width;
     }
-
-    public void init(RecordBatch incomingBatch, ProjectRecordBatch outgoingBatch) {
-        setIncomingBatch(incomingBatch);
-        setOutgoingBatch(outgoingBatch);
-        reset();
-
-        RecordBatchStats.printConfiguredBatchSize(outgoingBatch.getRecordBatchStatsContext(),
-          getOutputBatchSize());
+    rowWidth += totalFixedWidthColumnWidth;
+    rowWidth += totalComplexColumnWidth;
+    rowWidth += totalVariableColumnWidth;
+    int outPutRowCount;
+    if (rowWidth != 0) {
+      //if rowWidth is not zero, set the output row count in the sizer
+      setOutputRowCount(getOutputBatchSize(), rowWidth);
+      // if more rows can be allowed than the incoming row count, then set the
+      // output row count to the incoming row count.
+      outPutRowCount = Math.min(getOutputRowCount(), batchSizer.rowCount());
+    } else {
+      // if rowWidth == 0 then the memory manager does
+      // not have sufficient information to size the batch
+      // let the entire batch pass through.
+      // If incoming rc == 0, all RB Sizer look-ups will have
+      // 0 width and so total width can be 0
+      outPutRowCount = incomingBatch.getRecordCount();
     }
-
-    private void reset() {
-        rowWidth = 0;
-        totalFixedWidthColumnWidth = 0;
-        totalComplexColumnWidth = 0;
-
-        fixedWidthColumnCount = 0;
-        complexColumnsCount = 0;
+    setOutputRowCount(outPutRowCount);
+    long updateEndTime = System.currentTimeMillis();
+    logger.trace("update() : Output RC {}, BatchSizer RC {}, incoming RC {}, width {}, total fixed width {}"
+                + ", total variable width {}, total complex width {}, batchSizer time {} ms, update time {}  ms"
+                + ", manager {}, incoming {}",outPutRowCount, batchSizer.rowCount(), incomingBatch.getRecordCount(),
+                rowWidth, totalFixedWidthColumnWidth, totalVariableColumnWidth, totalComplexColumnWidth,
+                (batchSizerEndTime - updateStartTime),(updateEndTime - updateStartTime), this, incomingBatch);
+
+    RecordBatchStats.logRecordBatchStats(RecordBatchIOType.INPUT, getRecordBatchSizer(), outgoingBatch.getRecordBatchStatsContext());
+    updateIncomingStats();
+  }
+
+  public static int getMetadataWidth(ValueVector vv) {
+    int width = 0;
+    if (vv instanceof NullableVector) {
+      width += ((NullableVector)vv).getBitsVector().getPayloadByteCount(1);
     }
 
-    @Override
-    public void update() {
-        long updateStartTime = System.currentTimeMillis();
-        RecordBatchSizer batchSizer = new RecordBatchSizer(incomingBatch);
-        long batchSizerEndTime = System.currentTimeMillis();
-
-        setRecordBatchSizer(batchSizer);
-        rowWidth = 0;
-        int totalVariableColumnWidth = 0;
-        for (String outputColumnName : outputColumnSizes.keySet()) {
-            ColumnWidthInfo columnWidthInfo = outputColumnSizes.get(outputColumnName);
-            int width = -1;
-            if (columnWidthInfo.isFixedWidth()) {
-                // fixed width columns are accumulated in totalFixedWidthColumnWidth
-                ShouldNotReachHere();
-            } else {
-                //Walk the tree of OutputWidthExpressions to get a FixedLenExpr
-                //As the tree is walked, the RecordBatchSizer and function annotations
-                //are looked-up to come up with the final FixedLenExpr
-                OutputWidthExpression savedWidthExpr = columnWidthInfo.getOutputExpression();
-                OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-                OutputWidthExpression reducedExpr = savedWidthExpr.accept(new OutputWidthVisitor(), state);
-                width = ((FixedLenExpr)reducedExpr).getDataWidth();
-                Preconditions.checkState(width >= 0);
-                int metadataWidth = getMetadataWidth(columnWidthInfo.outputVV);
-                logger.trace("update(): fieldName {} width: {} metadataWidth: {}",
-                        columnWidthInfo.outputVV.getField().getName(), width, metadataWidth);
-                width += metadataWidth;
-            }
-            totalVariableColumnWidth += width;
-        }
-        rowWidth += totalFixedWidthColumnWidth;
-        rowWidth += totalComplexColumnWidth;
-        rowWidth += totalVariableColumnWidth;
-        int outPutRowCount;
-        if (rowWidth != 0) {
-            //if rowWidth is not zero, set the output row count in the sizer
-            setOutputRowCount(getOutputBatchSize(), rowWidth);
-            // if more rows can be allowed than the incoming row count, then set the
-            // output row count to the incoming row count.
-            outPutRowCount = Math.min(getOutputRowCount(), batchSizer.rowCount());
-        } else {
-            // if rowWidth == 0 then the memory manager does
-            // not have sufficient information to size the batch
-            // let the entire batch pass through.
-            // If incoming rc == 0, all RB Sizer look-ups will have
-            // 0 width and so total width can be 0
-            outPutRowCount = incomingBatch.getRecordCount();
-        }
-        setOutputRowCount(outPutRowCount);
-        long updateEndTime = System.currentTimeMillis();
-        logger.trace("update() : Output RC {}, BatchSizer RC {}, incoming RC {}, width {}, total fixed width {}"
-                    + ", total variable width {}, total complex width {}, batchSizer time {} ms, update time {}  ms"
-                    + ", manager {}, incoming {}",outPutRowCount, batchSizer.rowCount(), incomingBatch.getRecordCount(),
-                    rowWidth, totalFixedWidthColumnWidth, totalVariableColumnWidth, totalComplexColumnWidth,
-                    (batchSizerEndTime - updateStartTime),(updateEndTime - updateStartTime), this, incomingBatch);
-
-        RecordBatchStats.logRecordBatchStats(RecordBatchIOType.INPUT, getRecordBatchSizer(), outgoingBatch.getRecordBatchStatsContext());
-        updateIncomingStats();
+    if (vv instanceof VariableWidthVector) {
+      width += ((VariableWidthVector)vv).getOffsetVector().getPayloadByteCount(1);
 
 Review comment:
   OK, so the right way to do this would be to either a) provide a method on the VV base class, or better, generate a `TypeHelper `method since this info can be computed from just the type.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362529251
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
 
 Review comment:
   Could be rewritten to something like:
   ```java
     private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
                                        OutputColumnType outputColumnType, String inputColumnName,
                                        String outputColumnName) {
       variableWidthColumnCount++;
       logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
           printVV(vv), variableWidthColumnCount, outputColumnType);
       OutputWidthExpression outWidthExpr;
       if (outputColumnType == OutputColumnType.TRANSFER) {
         // Variable width transfers
         outWidthExpr = new VarLenReadExpr(inputColumnName);
       } else if (isComplex(vv.getField().getType())) {
         addComplexField(vv);
         return;
       } else {
         // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
         outWidthExpr = logicalExpression.accept(new OutputWidthVisitor(), new OutputWidthVisitorState(this));
       }
       VariableWidthColumnInfo columnWidthInfo = new VariableWidthColumnInfo(outWidthExpr, outputColumnType,
           WidthType.VARIABLE, -1, vv);// fieldWidth has to be obtained from the OutputWidthExpression
       VariableWidthColumnInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
       Preconditions.checkState(existingInfo == null);
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362499071
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
 
 Review comment:
   ```suggestion
     private ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362534494
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
+
+  public static String printVV(ValueVector vv) {
+    String str = "null";
+    if (vv != null) {
+      str = vv.getField().getName() + " " + vv.getField().getType();
     }
-
-    void addFixedWidthField(ValueVector vv) {
-        assert isFixedWidth(vv);
-        fixedWidthColumnCount++;
-        int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
-        totalFixedWidthColumnWidth += fixedFieldWidth;
-        logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+    return str;
+  }
+
+  void addComplexField(ValueVector vv) {
+    //Complex types are not yet supported. Just use a guess for the size
+    assert vv == null || isComplex(vv.getField().getType());
+    complexColumnsCount++;
+    // just a guess
+    totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
+    logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+  }
+
+  void addFixedWidthField(ValueVector vv) {
+    assert isFixedWidth(vv);
+    fixedWidthColumnCount++;
+    int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
+    totalFixedWidthColumnWidth += fixedFieldWidth;
+    logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+  }
+
+  public void init(RecordBatch incomingBatch, ProjectRecordBatch outgoingBatch) {
+    setIncomingBatch(incomingBatch);
+    setOutgoingBatch(outgoingBatch);
+    reset();
+
+    RecordBatchStats.printConfiguredBatchSize(outgoingBatch.getRecordBatchStatsContext(),
+      getOutputBatchSize());
+  }
+
+  private void reset() {
+    rowWidth = 0;
+    totalFixedWidthColumnWidth = 0;
+    totalComplexColumnWidth = 0;
+
+    fixedWidthColumnCount = 0;
+    complexColumnsCount = 0;
+  }
+
+  @Override
+  public void update() {
+    long updateStartTime = System.currentTimeMillis();
+    RecordBatchSizer batchSizer = new RecordBatchSizer(incomingBatch);
+    long batchSizerEndTime = System.currentTimeMillis();
+
+    setRecordBatchSizer(batchSizer);
+    rowWidth = 0;
+    int totalVariableColumnWidth = 0;
+    for (String outputColumnName : outputColumnSizes.keySet()) {
+      ColumnWidthInfo columnWidthInfo = outputColumnSizes.get(outputColumnName);
+      int width = -1;
+      if (columnWidthInfo.isFixedWidth()) {
+        // fixed width columns are accumulated in totalFixedWidthColumnWidth
+        ShouldNotReachHere();
+      } else {
+        //Walk the tree of OutputWidthExpressions to get a FixedLenExpr
+        //As the tree is walked, the RecordBatchSizer and function annotations
+        //are looked-up to come up with the final FixedLenExpr
+        OutputWidthExpression savedWidthExpr = columnWidthInfo.getOutputExpression();
+        OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+        OutputWidthExpression reducedExpr = savedWidthExpr.accept(new OutputWidthVisitor(), state);
+        width = ((FixedLenExpr)reducedExpr).getDataWidth();
+        Preconditions.checkState(width >= 0);
+        int metadataWidth = getMetadataWidth(columnWidthInfo.outputVV);
+        logger.trace("update(): fieldName {} width: {} metadataWidth: {}",
+                columnWidthInfo.outputVV.getField().getName(), width, metadataWidth);
+        width += metadataWidth;
+      }
+      totalVariableColumnWidth += width;
     }
-
-    public void init(RecordBatch incomingBatch, ProjectRecordBatch outgoingBatch) {
-        setIncomingBatch(incomingBatch);
-        setOutgoingBatch(outgoingBatch);
-        reset();
-
-        RecordBatchStats.printConfiguredBatchSize(outgoingBatch.getRecordBatchStatsContext(),
-          getOutputBatchSize());
+    rowWidth += totalFixedWidthColumnWidth;
+    rowWidth += totalComplexColumnWidth;
+    rowWidth += totalVariableColumnWidth;
+    int outPutRowCount;
+    if (rowWidth != 0) {
+      //if rowWidth is not zero, set the output row count in the sizer
+      setOutputRowCount(getOutputBatchSize(), rowWidth);
+      // if more rows can be allowed than the incoming row count, then set the
+      // output row count to the incoming row count.
+      outPutRowCount = Math.min(getOutputRowCount(), batchSizer.rowCount());
+    } else {
+      // if rowWidth == 0 then the memory manager does
+      // not have sufficient information to size the batch
+      // let the entire batch pass through.
+      // If incoming rc == 0, all RB Sizer look-ups will have
+      // 0 width and so total width can be 0
+      outPutRowCount = incomingBatch.getRecordCount();
     }
-
-    private void reset() {
-        rowWidth = 0;
-        totalFixedWidthColumnWidth = 0;
-        totalComplexColumnWidth = 0;
-
-        fixedWidthColumnCount = 0;
-        complexColumnsCount = 0;
+    setOutputRowCount(outPutRowCount);
+    long updateEndTime = System.currentTimeMillis();
+    logger.trace("update() : Output RC {}, BatchSizer RC {}, incoming RC {}, width {}, total fixed width {}"
+                + ", total variable width {}, total complex width {}, batchSizer time {} ms, update time {}  ms"
+                + ", manager {}, incoming {}",outPutRowCount, batchSizer.rowCount(), incomingBatch.getRecordCount(),
+                rowWidth, totalFixedWidthColumnWidth, totalVariableColumnWidth, totalComplexColumnWidth,
+                (batchSizerEndTime - updateStartTime),(updateEndTime - updateStartTime), this, incomingBatch);
+
+    RecordBatchStats.logRecordBatchStats(RecordBatchIOType.INPUT, getRecordBatchSizer(), outgoingBatch.getRecordBatchStatsContext());
+    updateIncomingStats();
+  }
+
+  public static int getMetadataWidth(ValueVector vv) {
+    int width = 0;
+    if (vv instanceof NullableVector) {
+      width += ((NullableVector)vv).getBitsVector().getPayloadByteCount(1);
     }
 
-    @Override
-    public void update() {
-        long updateStartTime = System.currentTimeMillis();
-        RecordBatchSizer batchSizer = new RecordBatchSizer(incomingBatch);
-        long batchSizerEndTime = System.currentTimeMillis();
-
-        setRecordBatchSizer(batchSizer);
-        rowWidth = 0;
-        int totalVariableColumnWidth = 0;
-        for (String outputColumnName : outputColumnSizes.keySet()) {
-            ColumnWidthInfo columnWidthInfo = outputColumnSizes.get(outputColumnName);
-            int width = -1;
-            if (columnWidthInfo.isFixedWidth()) {
-                // fixed width columns are accumulated in totalFixedWidthColumnWidth
-                ShouldNotReachHere();
-            } else {
-                //Walk the tree of OutputWidthExpressions to get a FixedLenExpr
-                //As the tree is walked, the RecordBatchSizer and function annotations
-                //are looked-up to come up with the final FixedLenExpr
-                OutputWidthExpression savedWidthExpr = columnWidthInfo.getOutputExpression();
-                OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-                OutputWidthExpression reducedExpr = savedWidthExpr.accept(new OutputWidthVisitor(), state);
-                width = ((FixedLenExpr)reducedExpr).getDataWidth();
-                Preconditions.checkState(width >= 0);
-                int metadataWidth = getMetadataWidth(columnWidthInfo.outputVV);
-                logger.trace("update(): fieldName {} width: {} metadataWidth: {}",
-                        columnWidthInfo.outputVV.getField().getName(), width, metadataWidth);
-                width += metadataWidth;
-            }
-            totalVariableColumnWidth += width;
-        }
-        rowWidth += totalFixedWidthColumnWidth;
-        rowWidth += totalComplexColumnWidth;
-        rowWidth += totalVariableColumnWidth;
-        int outPutRowCount;
-        if (rowWidth != 0) {
-            //if rowWidth is not zero, set the output row count in the sizer
-            setOutputRowCount(getOutputBatchSize(), rowWidth);
-            // if more rows can be allowed than the incoming row count, then set the
-            // output row count to the incoming row count.
-            outPutRowCount = Math.min(getOutputRowCount(), batchSizer.rowCount());
-        } else {
-            // if rowWidth == 0 then the memory manager does
-            // not have sufficient information to size the batch
-            // let the entire batch pass through.
-            // If incoming rc == 0, all RB Sizer look-ups will have
-            // 0 width and so total width can be 0
-            outPutRowCount = incomingBatch.getRecordCount();
-        }
-        setOutputRowCount(outPutRowCount);
-        long updateEndTime = System.currentTimeMillis();
-        logger.trace("update() : Output RC {}, BatchSizer RC {}, incoming RC {}, width {}, total fixed width {}"
-                    + ", total variable width {}, total complex width {}, batchSizer time {} ms, update time {}  ms"
-                    + ", manager {}, incoming {}",outPutRowCount, batchSizer.rowCount(), incomingBatch.getRecordCount(),
-                    rowWidth, totalFixedWidthColumnWidth, totalVariableColumnWidth, totalComplexColumnWidth,
-                    (batchSizerEndTime - updateStartTime),(updateEndTime - updateStartTime), this, incomingBatch);
-
-        RecordBatchStats.logRecordBatchStats(RecordBatchIOType.INPUT, getRecordBatchSizer(), outgoingBatch.getRecordBatchStatsContext());
-        updateIncomingStats();
+    if (vv instanceof VariableWidthVector) {
+      width += ((VariableWidthVector)vv).getOffsetVector().getPayloadByteCount(1);
     }
 
-    public static int getMetadataWidth(ValueVector vv) {
-        int width = 0;
-        if (vv instanceof NullableVector) {
-            width += ((NullableVector)vv).getBitsVector().getPayloadByteCount(1);
-        }
-
-        if (vv instanceof VariableWidthVector) {
-            width += ((VariableWidthVector)vv).getOffsetVector().getPayloadByteCount(1);
-        }
-
-        if (vv instanceof BaseRepeatedValueVector) {
-            width += ((BaseRepeatedValueVector)vv).getOffsetVector().getPayloadByteCount(1);
-            width += (getMetadataWidth(((BaseRepeatedValueVector)vv).getDataVector()) * RepeatedValueVector.DEFAULT_REPEAT_PER_RECORD);
-        }
-        return width;
+    if (vv instanceof BaseRepeatedValueVector) {
+      width += ((BaseRepeatedValueVector)vv).getOffsetVector().getPayloadByteCount(1);
+      width += (getMetadataWidth(((BaseRepeatedValueVector)vv).getDataVector()) * RepeatedValueVector.DEFAULT_REPEAT_PER_RECORD);
 
 Review comment:
   ```suggestion
         width += (getMetadataWidth(((BaseRepeatedValueVector) vv).getDataVector()) * RepeatedValueVector.DEFAULT_REPEAT_PER_RECORD);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362495815
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
 Review comment:
   ```suggestion
       OutputWidthExpression getOutputExpression() { return outputExpression; }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364223072
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectBatchBuilder.java
 ##########
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class ProjectBatchBuilder implements ProjectionMaterializer.BatchBuilder {
+  private final ProjectRecordBatch projectBatch;
+  private final VectorContainer container;
+  private final SchemaChangeCallBack callBack;
+  private final RecordBatch incomingBatch;
+  final List<TransferPair> transfers = new ArrayList<>();
+
+  public ProjectBatchBuilder(ProjectRecordBatch projectBatch, VectorContainer container,
+      SchemaChangeCallBack callBack, RecordBatch incomingBatch) {
+    this.projectBatch = projectBatch;
+    this.container = container;
+    this.callBack = callBack;
+    this.incomingBatch = incomingBatch;
+  }
+
+  @Override
+  public void addTransferField(String name, ValueVector vvIn) {
+    FieldReference ref = new FieldReference(name);
+    ValueVector vvOut = container.addOrGet(MaterializedField.create(ref.getAsNamePart().getName(),
+      vvIn.getField().getType()), callBack);
+    projectBatch.memoryManager.addTransferField(vvIn, vvIn.getField().getName(), vvOut.getField().getName());
+    transfers.add(vvIn.makeTransferPair(vvOut));
+  }
+
+  @Override
+  public int addDirectTransfer(FieldReference ref, ValueVectorReadExpression vectorRead) {
+    TypedFieldId id = vectorRead.getFieldId();
+    ValueVector vvIn = incomingBatch.getValueAccessorById(id.getIntermediateClass(), id.getFieldIds()).getValueVector();
+    Preconditions.checkNotNull(incomingBatch);
+
+    ValueVector vvOut =
+        container.addOrGet(MaterializedField.create(ref.getLastSegment().getNameSegment().getPath(),
+        vectorRead.getMajorType()), callBack);
+    TransferPair tp = vvIn.makeTransferPair(vvOut);
+    projectBatch.memoryManager.addTransferField(vvIn, TypedFieldId.getPath(id, incomingBatch), vvOut.getField().getName());
+    transfers.add(tp);
+    return vectorRead.getFieldId().getFieldIds()[0];
+  }
+
+  @Override
+  public ValueVectorWriteExpression addOutputVector(String name, LogicalExpression expr) {
+    MaterializedField outputField = MaterializedField.create(name, expr.getMajorType());
+    ValueVector vv = container.addOrGet(outputField, callBack);
+    projectBatch.allocationVectors.add(vv);
+    TypedFieldId fid = container.getValueVectorId(SchemaPath.getSimplePath(outputField.getName()));
+    ValueVectorWriteExpression write = new ValueVectorWriteExpression(fid, expr, true);
+    projectBatch.memoryManager.addNewField(vv, write);
+    return write;
+  }
+
+  @Override
+  public void addComplexField(FieldReference ref) {
+    initComplexWriters();
+    if (projectBatch.complexFieldReferencesList == null) {
+      projectBatch.complexFieldReferencesList = Lists.newArrayList();
+    } else {
+      projectBatch.complexFieldReferencesList.clear();
+    }
+
+    // save the field reference for later for getting schema when input is empty
+    projectBatch.complexFieldReferencesList.add(ref);
+    projectBatch.memoryManager.addComplexField(null); // this will just add an estimate to the row width
+  }
+
+  private void initComplexWriters() {
+    // Lazy initialization of the list of complex writers, if not done yet.
+    if (projectBatch.complexWriters == null) {
+      projectBatch.complexWriters = new ArrayList<>();
+    } else {
+      projectBatch.complexWriters.clear();
+    }
+  }
+
+  @Override
+  public ValueVectorWriteExpression addEvalVector(String outputName, LogicalExpression expr) {
+    MaterializedField outputField = MaterializedField.create(outputName, expr.getMajorType());
+    ValueVector ouputVector = container.addOrGet(outputField, callBack);
+    projectBatch.allocationVectors.add(ouputVector);
+    TypedFieldId fid = container.getValueVectorId(SchemaPath.getSimplePath(outputField.getName()));
+    boolean useSetSafe = !(ouputVector instanceof FixedWidthVector);
+    ValueVectorWriteExpression write = new ValueVectorWriteExpression(fid, expr, useSetSafe);
+    projectBatch.memoryManager.addNewField(ouputVector, write);
+
+    // We cannot do multiple transfers from the same vector. However we still
+    // need to instantiate the output vector.
+    if (expr instanceof ValueVectorReadExpression) {
+      ValueVectorReadExpression vectorRead = (ValueVectorReadExpression) expr;
+      if (!vectorRead.hasReadPath()) {
+        TypedFieldId id = vectorRead.getFieldId();
+        ValueVector vvIn = incomingBatch.getValueAccessorById(id.getIntermediateClass(),
+                id.getFieldIds()).getValueVector();
+        vvIn.makeTransferPair(ouputVector);
+      }
+    }
+    return write;
+  }
+}
 
 Review comment:
   ```suggestion
   }
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362530889
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
+
+  public static String printVV(ValueVector vv) {
+    String str = "null";
+    if (vv != null) {
+      str = vv.getField().getName() + " " + vv.getField().getType();
     }
-
-    void addFixedWidthField(ValueVector vv) {
-        assert isFixedWidth(vv);
-        fixedWidthColumnCount++;
-        int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
-        totalFixedWidthColumnWidth += fixedFieldWidth;
-        logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+    return str;
+  }
+
+  void addComplexField(ValueVector vv) {
+    //Complex types are not yet supported. Just use a guess for the size
+    assert vv == null || isComplex(vv.getField().getType());
+    complexColumnsCount++;
+    // just a guess
+    totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
+    logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+  }
+
+  void addFixedWidthField(ValueVector vv) {
 
 Review comment:
   ```suggestion
    private void addFixedWidthField(ValueVector vv) {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571241220
 
 
   One more thought about the CG. In working on another issue, it slowly dawned on me a couple of other worthwhile goals in addition to those mentioned earlier.
   
   We mentioned above that CD combines logical planning and code gen, creating a very complex bit of code. Again, some of us are not smart enough to be able to quickly understand the combined logic, so it would make sense to separate out the two steps so that average folks can more easily understand and modify them.
   
   Then, we need to think about debug and testing. As originally implemented, CG was direct path from operator to running byte codes. The original authors were smart enough to be able to debug the result by stepping through the compiled byte codes. Most of us are not that smart. So,  few years back we added the "plain Java" and "save code for debugging" modes which made it possible for us newbies to view and step through the Java code.
   
   Still, in order to debug, we need to run Drill, find a useful unit test, run the query, and then debug CG in the context of running a query. We are limited in what we can test based on what queries we can find or create to set up the scenarios of interest.
   
   Better would be to apply unit testing principles: separate out CG so we can set up a set of inputs, run CG, and diff the resulting code against a golden copy. This is how we debugged the Impala planner and it worked very well. This way, we can easily set up every data type without needing to use a zillion different input file formats. That is:
   
   ```
   (operator config, schema, options) --> CG --> (Java code, exec plan)
   ```
   
   Where the "exec plan" would be things like the knick-lacks that the `VectorState` currently wrap. That is, rather than having CG set up vectors, maybe CG creates a plan that says, "here are the vectors you will need", then the operator creates the vectors. Still pretty hand-wavey at this point.
   
   This refactoring can be seen as a naive grasping toward that goal. We need to pull out CG, but it is not clear yet the final form. The same can be said of the earlier refactoring of the external sort to separate out CG. Sometimes, just by taking a step, we can get a bit better understanding of where we need to go.
   
   All that said, the goal here is just to take a step, not to implement a final solution. I realize, however, that makes this PR hard to review since it is incomplete: it takes a step, but does not clearly state "a step to where?"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364290284
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectionMaterializer.java
 ##########
 @@ -0,0 +1,625 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.commons.collections.map.CaseInsensitiveMap;
+import org.apache.drill.common.expression.ConvertExpression;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.ExpressionPosition;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.FunctionCallFactory;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions;
+import org.apache.drill.common.expression.PathSegment.NameSegment;
+import org.apache.drill.common.expression.fn.FunctionReplacementUtils;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.DrillFuncHolderExpr;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.expr.fn.FunctionLookupContext;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.StarColumnHelper;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.carrotsearch.hppc.IntHashSet;
+
+/**
+ * Plans the projection given the incoming and requested outgoing schemas. Works
+ * with the {@link VectorState} to create required vectors, writers and so on.
+ * Populates the code generator with the "projector" expressions.
+ */
+class ProjectionMaterializer {
+  private static final Logger logger = LoggerFactory.getLogger(ProjectionMaterializer.class);
+  private static final String EMPTY_STRING = "";
+
+  /**
+   * Abstracts the physical vector setup operations to separate
+   * the physical setup, in <code>ProjectRecordBatch</code>, from the
+   * logical setup in the materializer class.
+   */
+  public interface BatchBuilder {
+    void addTransferField(String name, ValueVector vvIn);
+    ValueVectorWriteExpression addOutputVector(String name, LogicalExpression expr);
+    int addDirectTransfer(FieldReference ref, ValueVectorReadExpression vectorRead);
+    void addComplexField(FieldReference ref);
+    ValueVectorWriteExpression addEvalVector(String outputName,
+        LogicalExpression expr);
+  }
+
+  private static class ClassifierResult {
+    private boolean isStar;
+    private List<String> outputNames;
+    private String prefix = "";
+    private final HashMap<String, Integer> prefixMap = Maps.newHashMap();
+    private final CaseInsensitiveMap outputMap = new CaseInsensitiveMap();
+    private final CaseInsensitiveMap sequenceMap = new CaseInsensitiveMap();
+
+    private void clear() {
+      isStar = false;
+      prefix = "";
+      if (outputNames != null) {
+        outputNames.clear();
+      }
+
+      // note: don't clear the internal maps since they have cumulative data..
+    }
+  }
+
+  private final ClassGenerator<Projector> cg;
+  private final VectorAccessible incomingBatch;
+  private final BatchSchema incomingSchema;
+  private final List<NamedExpression> exprSpec;
+  private final FunctionLookupContext functionLookupContext;
+  private final BatchBuilder batchBuilder;
+  private final boolean unionTypeEnabled;
+  private final ErrorCollector collector = new ErrorCollectorImpl();
+  private final ColumnExplorer columnExplorer;
+  private final IntHashSet transferFieldIds = new IntHashSet();
+  private final ProjectionMaterializer.ClassifierResult result = new ClassifierResult();
+  private boolean isAnyWildcard;
+  private boolean classify;
+
+  public ProjectionMaterializer(OptionManager options,
+      VectorAccessible incomingBatch, List<NamedExpression> exprSpec,
+      FunctionLookupContext functionLookupContext, BatchBuilder batchBuilder,
+      boolean unionTypeEnabled) {
+    this.incomingBatch = incomingBatch;
+    this.incomingSchema = incomingBatch.getSchema();
+    this.exprSpec = exprSpec;
+    this.functionLookupContext = functionLookupContext;
+    this.batchBuilder = batchBuilder;
+    this.unionTypeEnabled = unionTypeEnabled;
+    columnExplorer = new ColumnExplorer(options);
+    cg = CodeGenerator.getRoot(Projector.TEMPLATE_DEFINITION, options);
+  }
+
+  public Projector generateProjector(FragmentContext context, boolean saveCode)
+      throws ClassTransformationException, IOException, SchemaChangeException {
+    long setupNewSchemaStartTime = System.currentTimeMillis();
+    setup();
+    CodeGenerator<Projector> codeGen = cg.getCodeGenerator();
+    codeGen.plainJavaCapable(true);
+    codeGen.saveCodeForDebugging(saveCode);
+    Projector projector = context.getImplementationClass(codeGen);
+
+    long setupNewSchemaEndTime = System.currentTimeMillis();
+    logger.trace("generateProjector: time {}  ms, Project {}, incoming {}",
+             (setupNewSchemaEndTime - setupNewSchemaStartTime), exprSpec, incomingSchema);
+    return projector;
+  }
+
+  private void setup() throws SchemaChangeException {
+    List<NamedExpression> exprs = exprSpec != null ? exprSpec
+        : inferExpressions();
+    isAnyWildcard = isAnyWildcard(exprs);
+    classify = isClassificationNeeded(exprs);
+
+    for (NamedExpression namedExpression : exprs) {
+      setupExpression(namedExpression);
+    }
+  }
+
+  private List<NamedExpression> inferExpressions() {
+    List<NamedExpression> exprs = Lists.newArrayList();
 
 Review comment:
   Please replace with new ArrayList<>() if possible

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362507186
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
 
 Review comment:
   ```suggestion
     private void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362531195
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
+
+  public static String printVV(ValueVector vv) {
+    String str = "null";
+    if (vv != null) {
+      str = vv.getField().getName() + " " + vv.getField().getType();
     }
-
-    void addFixedWidthField(ValueVector vv) {
-        assert isFixedWidth(vv);
-        fixedWidthColumnCount++;
-        int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
-        totalFixedWidthColumnWidth += fixedFieldWidth;
-        logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+    return str;
+  }
+
+  void addComplexField(ValueVector vv) {
+    //Complex types are not yet supported. Just use a guess for the size
+    assert vv == null || isComplex(vv.getField().getType());
+    complexColumnsCount++;
+    // just a guess
+    totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
+    logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+  }
+
+  void addFixedWidthField(ValueVector vv) {
+    assert isFixedWidth(vv);
+    fixedWidthColumnCount++;
+    int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
+    totalFixedWidthColumnWidth += fixedFieldWidth;
+    logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+  }
+
+  public void init(RecordBatch incomingBatch, ProjectRecordBatch outgoingBatch) {
+    setIncomingBatch(incomingBatch);
+    setOutgoingBatch(outgoingBatch);
+    reset();
+
+    RecordBatchStats.printConfiguredBatchSize(outgoingBatch.getRecordBatchStatsContext(),
+      getOutputBatchSize());
+  }
+
+  private void reset() {
+    rowWidth = 0;
+    totalFixedWidthColumnWidth = 0;
+    totalComplexColumnWidth = 0;
+
+    fixedWidthColumnCount = 0;
+    complexColumnsCount = 0;
+  }
+
+  @Override
+  public void update() {
+    long updateStartTime = System.currentTimeMillis();
+    RecordBatchSizer batchSizer = new RecordBatchSizer(incomingBatch);
+    long batchSizerEndTime = System.currentTimeMillis();
+
+    setRecordBatchSizer(batchSizer);
+    rowWidth = 0;
+    int totalVariableColumnWidth = 0;
+    for (String outputColumnName : outputColumnSizes.keySet()) {
+      ColumnWidthInfo columnWidthInfo = outputColumnSizes.get(outputColumnName);
+      int width = -1;
+      if (columnWidthInfo.isFixedWidth()) {
+        // fixed width columns are accumulated in totalFixedWidthColumnWidth
+        ShouldNotReachHere();
+      } else {
 
 Review comment:
   please remove

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571414414
 
 
   @ihuzenko, went ahead and adopted the interface solution. The interface is probably a better idea: if we try to isolate code gen for testing, we can create a mock implementation that skips the actual vector work. Please let me know if this satisfies your concerns. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362493899
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
 
 Review comment:
   The ```width``` is always ```-1``` and getter is unused, so I suggest removing the field and related code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571034273
 
 
   Hello @paul-rogers , if you don't want to extract the ```VectorState``` class then at least please put related fields into it, so it'll be visible which state is managed by the class. Although extraction is better way for me, since it won't allow anybody to easily mix access to related fields (```transfers, allocationVectors, complexWriters, complexFieldReferences etc...```). PS. Here there is no quick solution without serious rewriting to make everything look good:)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362497826
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
 Review comment:
   Since the method is already outdated (DICT missed), I would suggest replacing its usages with 
   ```java
   Types.isComplex(type) || Types.isUnion(type)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364554593
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/record/TypedFieldId.java
 ##########
 @@ -241,7 +241,7 @@ public TypedFieldId build() {
         secondaryFinal = finalType;
       }
 
-      MajorType actualFinalType = finalType;
+      //MajorType actualFinalType = finalType;
 
 Review comment:
   Not sure. Not sure what the comments here are about, so I thought I'd leave them. Just commenting out an unused variable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362501279
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
 Review comment:
   Since there is only one usage of the method and assertion present before the call please replace the method with direct ```((FixedWidthVector) vv).getValueWidth()``` call. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362486603
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
 
 Review comment:
   After a close look at the code, I believe this enum is unnecessary. All usages of the ```ColumnWidthInfo``` constructor accept ```WidthType.VARIABLE``` and there is a block inside the update method : 
   ```java
         if (columnWidthInfo.isFixedWidth()) {
           // fixed width columns are accumulated in totalFixedWidthColumnWidth
           ShouldNotReachHere();
         } else {...}
   ```
   Please remove the enum and related code. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364554333
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectBatchBuilder.java
 ##########
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class ProjectBatchBuilder implements ProjectionMaterializer.BatchBuilder {
+  private final ProjectRecordBatch projectBatch;
+  private final VectorContainer container;
+  private final SchemaChangeCallBack callBack;
+  private final RecordBatch incomingBatch;
+  final List<TransferPair> transfers = new ArrayList<>();
 
 Review comment:
   Made private. Added getter. But. left initializer with the field since this is a final member and we often initialize such objects this way as a way of saying that the field does not depend on constructor input.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364288964
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/record/TypedFieldId.java
 ##########
 @@ -241,7 +241,7 @@ public TypedFieldId build() {
         secondaryFinal = finalType;
       }
 
-      MajorType actualFinalType = finalType;
+      //MajorType actualFinalType = finalType;
 
 Review comment:
   Should we keep this commented code?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362506622
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
 
 Review comment:
   Please replace the method with a shorter version: 
   ```java
     public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
       Preconditions.checkArgument(!Types.isVarWidthType(majorType.getMinorType()),
           "Expected fixed type but was '%s'.", majorType.getMinorType());
       return TypeHelper.getSize(majorType);
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362452329
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
 
 Review comment:
   ```suggestion
    * fixed-width fields are just accumulated into a single total. Note: The PMM,
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362495939
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
 Review comment:
   ```suggestion
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362499191
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
 Review comment:
   ```suggestion
     private static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362493497
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
 
 Review comment:
   ```suggestion
     private static class VariableWidthColumnInfo {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-572380826
 
 
   Thank you @ihuzenko and @arina-ielchiieva for your reviews. Addressed remaining minor issues. Squashed commits.
   
   @ihuzenko, your many suggestions made this a much better solution. It is almost, but not quite, clean enough that I could write code gen unit tests. Need to clean up that pesky `ExpressionTreeMaterializer` issue, then we'll be able to write such tests. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364208438
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/OutputWidthVisitor.java
 ##########
 @@ -43,238 +43,213 @@
 
 import java.util.ArrayList;
 
-public class OutputWidthVisitor extends AbstractExecExprVisitor<OutputWidthExpression, OutputWidthVisitorState,
-        RuntimeException> {
-
-    @Override
-    public OutputWidthExpression visitVarDecimalConstant(VarDecimalExpression varDecimalExpression,
-                                                         OutputWidthVisitorState state) throws RuntimeException {
-        Preconditions.checkArgument(varDecimalExpression.getMajorType().hasPrecision());
-        return new FixedLenExpr(varDecimalExpression.getMajorType().getPrecision());
-    }
-
-
-    /**
-     *
-     * Records the {@link IfExpression} as a {@link IfElseWidthExpr}. IfElseWidthExpr will be reduced to
-     * a {@link FixedLenExpr} by taking the max of the if-expr-width and the else-expr-width.
-     *
-     * @param ifExpression
-     * @param state
-     * @return IfElseWidthExpr
-     * @throws RuntimeException
-     */
-    @Override
-    public OutputWidthExpression visitIfExpression(IfExpression ifExpression, OutputWidthVisitorState state)
-                                                                    throws RuntimeException {
-        IfExpression.IfCondition condition = ifExpression.ifCondition;
-        LogicalExpression ifExpr = condition.expression;
-        LogicalExpression elseExpr = ifExpression.elseExpression;
-
-        OutputWidthExpression ifWidthExpr = ifExpr.accept(this, state);
-        OutputWidthExpression elseWidthExpr = null;
-        if (elseExpr != null) {
-            elseWidthExpr = elseExpr.accept(this, state);
-        }
-        return new IfElseWidthExpr(ifWidthExpr, elseWidthExpr);
+public class OutputWidthVisitor
+    extends AbstractExecExprVisitor<OutputWidthExpression,
+            OutputWidthVisitorState,
+            RuntimeException> {
 
 Review comment:
   Please reformat to: 
   
   ```java
   public class OutputWidthVisitor extends
       AbstractExecExprVisitor<OutputWidthExpression, OutputWidthVisitorState, RuntimeException> {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362452176
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
 
 Review comment:
   ```suggestion
    * registers the field with PMM. If the field is a variable-width field, PMM
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364210835
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/OutputWidthVisitor.java
 ##########
 @@ -43,238 +43,213 @@
 
 import java.util.ArrayList;
 
-public class OutputWidthVisitor extends AbstractExecExprVisitor<OutputWidthExpression, OutputWidthVisitorState,
-        RuntimeException> {
-
-    @Override
-    public OutputWidthExpression visitVarDecimalConstant(VarDecimalExpression varDecimalExpression,
-                                                         OutputWidthVisitorState state) throws RuntimeException {
-        Preconditions.checkArgument(varDecimalExpression.getMajorType().hasPrecision());
-        return new FixedLenExpr(varDecimalExpression.getMajorType().getPrecision());
-    }
-
-
-    /**
-     *
-     * Records the {@link IfExpression} as a {@link IfElseWidthExpr}. IfElseWidthExpr will be reduced to
-     * a {@link FixedLenExpr} by taking the max of the if-expr-width and the else-expr-width.
-     *
-     * @param ifExpression
-     * @param state
-     * @return IfElseWidthExpr
-     * @throws RuntimeException
-     */
-    @Override
-    public OutputWidthExpression visitIfExpression(IfExpression ifExpression, OutputWidthVisitorState state)
-                                                                    throws RuntimeException {
-        IfExpression.IfCondition condition = ifExpression.ifCondition;
-        LogicalExpression ifExpr = condition.expression;
-        LogicalExpression elseExpr = ifExpression.elseExpression;
-
-        OutputWidthExpression ifWidthExpr = ifExpr.accept(this, state);
-        OutputWidthExpression elseWidthExpr = null;
-        if (elseExpr != null) {
-            elseWidthExpr = elseExpr.accept(this, state);
-        }
-        return new IfElseWidthExpr(ifWidthExpr, elseWidthExpr);
+public class OutputWidthVisitor
+    extends AbstractExecExprVisitor<OutputWidthExpression,
+            OutputWidthVisitorState,
+            RuntimeException> {
+
+  @Override
+  public OutputWidthExpression visitVarDecimalConstant(VarDecimalExpression varDecimalExpression,
+                                                       OutputWidthVisitorState state) throws RuntimeException {
+    Preconditions.checkArgument(varDecimalExpression.getMajorType().hasPrecision());
+    return new FixedLenExpr(varDecimalExpression.getMajorType().getPrecision());
+  }
+
+  /**
+   *
 
 Review comment:
   ```suggestion
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364218502
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectBatchBuilder.java
 ##########
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class ProjectBatchBuilder implements ProjectionMaterializer.BatchBuilder {
+  private final ProjectRecordBatch projectBatch;
+  private final VectorContainer container;
+  private final SchemaChangeCallBack callBack;
+  private final RecordBatch incomingBatch;
+  final List<TransferPair> transfers = new ArrayList<>();
 
 Review comment:
   please make also private + add getter + move ```= new ArrayList<>()``` into constructor. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers edited a comment on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers edited a comment on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571235610
 
 
   @ihuzenko, the goal of this refactoring was simply to pull out the code gen part. The `VectorState` is an admitted hack to create a clean interface on the CG side, though it is messy on the operator side.
   
   I was really hoping to not change the execution part of the project operator in this PR in order to limit the scope of changes. Maybe I'll go with the interface route (discussed in an earlier note), which will help with the longer term goal outlined below. Let me play with the code to see how that approach might work.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571235610
 
 
   @ihuzenko, the goal of this refactoring was simply to pull out the code gen part. The `VectorState` is an admitted hack to create a clean interface on the CG side, though it is messy on the operator side.
   
   Since we don't like this approach, for now, I'll remove `VectorState` and simply move its methods onto the project operator, and pass the project operator to the CG component. Not ideal, but we can revisit the issue later when we want to do further cleanup.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362491665
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
 
 Review comment:
   ```suggestion
     private enum OutputColumnType {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362497334
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
 Review comment:
   Since ```getIncomingBatch()``` already exists in the class, please remove the method and replace its usages. 
   
   ```suggestion
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571437864
 
 
   @ihuzenko, refactored some additional steps to adopt the solution you suggested. This version still uses the interface, with an implementation in a class other than the project record batch. This turns out to be handy because, oddly, the project operator generates code for two separate incoming batches: the "real" one and a "fake" empty one. The `ProjectRecordBatchBuilder` holds onto the input batch so we don't have to pass it into the materializer and back out.
   
   This version tries to eliminate all references to the incoming batch in the materializer, and instead work with the batch schema. Annoyingly, the `ExpressionTreeMaterializer` needs the input batch so it can iterate over the vectors to get their schemas. If all we need is the schema, we don't need actual vectors. So, if we can pass in a schema, we can completely separate code gen from physical vectors.
   
   The next refactoring move is to change this code to work with a schema (or interface to obtain the schema) rather than the actual vectors. Now, as it turns out, the batch schema has limitations for complex types, which is one of the reasons we created the `TupleMetadata` family of classes. So, perhaps we can convert the incoming batch to a `TupleMetadata` schema and use that. (The code to do that already exists in the `RowSet` classes.)
   
   Or, we can just pass an interface which will return the `TypedFieldId` for each column. Or, do that conversion ahead of time, and pass in the results. Will have to play with it some to see which solution is the simplest.
   
   Since the required work will be rather large; I propose we do that as a separate PR.
   
   Have we done enough for this one PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362500067
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
+
+  public static String printVV(ValueVector vv) {
+    String str = "null";
+    if (vv != null) {
+      str = vv.getField().getName() + " " + vv.getField().getType();
     }
-
-    void addFixedWidthField(ValueVector vv) {
-        assert isFixedWidth(vv);
-        fixedWidthColumnCount++;
-        int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
-        totalFixedWidthColumnWidth += fixedFieldWidth;
-        logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+    return str;
+  }
+
+  void addComplexField(ValueVector vv) {
+    //Complex types are not yet supported. Just use a guess for the size
+    assert vv == null || isComplex(vv.getField().getType());
+    complexColumnsCount++;
+    // just a guess
+    totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
+    logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+  }
+
+  void addFixedWidthField(ValueVector vv) {
+    assert isFixedWidth(vv);
+    fixedWidthColumnCount++;
+    int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
 
 Review comment:
   ```suggestion
       int fixedFieldWidth = ((FixedWidthVector) vv).getValueWidth();
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362496119
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
 
 Review comment:
   ```suggestion
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364234843
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectionMaterializer.java
 ##########
 @@ -0,0 +1,625 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.commons.collections.map.CaseInsensitiveMap;
+import org.apache.drill.common.expression.ConvertExpression;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.ExpressionPosition;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.FunctionCallFactory;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions;
+import org.apache.drill.common.expression.PathSegment.NameSegment;
+import org.apache.drill.common.expression.fn.FunctionReplacementUtils;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.DrillFuncHolderExpr;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.expr.fn.FunctionLookupContext;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.StarColumnHelper;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.carrotsearch.hppc.IntHashSet;
+
+/**
+ * Plans the projection given the incoming and requested outgoing schemas. Works
+ * with the {@link VectorState} to create required vectors, writers and so on.
 
 Review comment:
   please update the comment since ```VectorState``` is gone.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362496478
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
 Review comment:
   please remove

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-570465320
 
 
   @ihuzenko, thanks for your thorough review, as usual. Here is my general philosophy on refactoring: do it in bite-size chunks. The first step here was simply to break code gen out of the operator itself. By leaving the bulk of the code unchanged, one can do a review by diffing the old and new files (not in GitHub, sadly), and see that blocks of code are identical, they have just been moved; in some cases from one huge function to smaller functions. Then, once we're satisfied that this simple refactoring works, we can think about the next step.
   
   If you agree with the step-by-step approach, then perhaps we can commit this first step to put us in position to make subsequent changes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362495075
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
 
 Review comment:
   Actually the field and getter aren't used, but if you think this could be useful for debugging then it can be left as is.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-570618977
 
 
   @paul-rogers thanks for the quick update. I agree that using the already generated and compiled code whenever possible would be very good in the long run. Also, I agree with the step-by-step refactoring approach. With respect to this PR, I propose to extract the ```VectorState``` into a separate class together with all the ```ProjectRecordBatch```'s fields that can be encapsulated. I believe that the extraction of ```VectorState``` won't allow using mixed access to fields in the future and opens more ways for later improvements. In order to better express my thought, I created the sample [commit](https://github.com/ihuzenko/drill/commit/80fb0065cf4d11b836383b031dffc07b92a5ad91). Sorry for the late reply.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364289943
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectionMaterializer.java
 ##########
 @@ -0,0 +1,625 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.commons.collections.map.CaseInsensitiveMap;
+import org.apache.drill.common.expression.ConvertExpression;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.ExpressionPosition;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.FunctionCallFactory;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions;
+import org.apache.drill.common.expression.PathSegment.NameSegment;
+import org.apache.drill.common.expression.fn.FunctionReplacementUtils;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.DrillFuncHolderExpr;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.expr.fn.FunctionLookupContext;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.StarColumnHelper;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.carrotsearch.hppc.IntHashSet;
+
+/**
+ * Plans the projection given the incoming and requested outgoing schemas. Works
+ * with the {@link VectorState} to create required vectors, writers and so on.
+ * Populates the code generator with the "projector" expressions.
+ */
+class ProjectionMaterializer {
+  private static final Logger logger = LoggerFactory.getLogger(ProjectionMaterializer.class);
+  private static final String EMPTY_STRING = "";
+
+  /**
+   * Abstracts the physical vector setup operations to separate
+   * the physical setup, in <code>ProjectRecordBatch</code>, from the
+   * logical setup in the materializer class.
+   */
+  public interface BatchBuilder {
+    void addTransferField(String name, ValueVector vvIn);
+    ValueVectorWriteExpression addOutputVector(String name, LogicalExpression expr);
+    int addDirectTransfer(FieldReference ref, ValueVectorReadExpression vectorRead);
+    void addComplexField(FieldReference ref);
+    ValueVectorWriteExpression addEvalVector(String outputName,
+        LogicalExpression expr);
+  }
+
+  private static class ClassifierResult {
+    private boolean isStar;
+    private List<String> outputNames;
+    private String prefix = "";
+    private final HashMap<String, Integer> prefixMap = Maps.newHashMap();
+    private final CaseInsensitiveMap outputMap = new CaseInsensitiveMap();
+    private final CaseInsensitiveMap sequenceMap = new CaseInsensitiveMap();
+
+    private void clear() {
+      isStar = false;
+      prefix = "";
+      if (outputNames != null) {
+        outputNames.clear();
+      }
+
+      // note: don't clear the internal maps since they have cumulative data..
+    }
+  }
+
+  private final ClassGenerator<Projector> cg;
+  private final VectorAccessible incomingBatch;
+  private final BatchSchema incomingSchema;
+  private final List<NamedExpression> exprSpec;
+  private final FunctionLookupContext functionLookupContext;
+  private final BatchBuilder batchBuilder;
+  private final boolean unionTypeEnabled;
+  private final ErrorCollector collector = new ErrorCollectorImpl();
+  private final ColumnExplorer columnExplorer;
+  private final IntHashSet transferFieldIds = new IntHashSet();
+  private final ProjectionMaterializer.ClassifierResult result = new ClassifierResult();
+  private boolean isAnyWildcard;
+  private boolean classify;
+
+  public ProjectionMaterializer(OptionManager options,
+      VectorAccessible incomingBatch, List<NamedExpression> exprSpec,
+      FunctionLookupContext functionLookupContext, BatchBuilder batchBuilder,
+      boolean unionTypeEnabled) {
+    this.incomingBatch = incomingBatch;
+    this.incomingSchema = incomingBatch.getSchema();
+    this.exprSpec = exprSpec;
+    this.functionLookupContext = functionLookupContext;
+    this.batchBuilder = batchBuilder;
+    this.unionTypeEnabled = unionTypeEnabled;
+    columnExplorer = new ColumnExplorer(options);
+    cg = CodeGenerator.getRoot(Projector.TEMPLATE_DEFINITION, options);
+  }
+
+  public Projector generateProjector(FragmentContext context, boolean saveCode)
+      throws ClassTransformationException, IOException, SchemaChangeException {
+    long setupNewSchemaStartTime = System.currentTimeMillis();
+    setup();
+    CodeGenerator<Projector> codeGen = cg.getCodeGenerator();
+    codeGen.plainJavaCapable(true);
+    codeGen.saveCodeForDebugging(saveCode);
+    Projector projector = context.getImplementationClass(codeGen);
+
+    long setupNewSchemaEndTime = System.currentTimeMillis();
+    logger.trace("generateProjector: time {}  ms, Project {}, incoming {}",
+             (setupNewSchemaEndTime - setupNewSchemaStartTime), exprSpec, incomingSchema);
+    return projector;
+  }
+
+  private void setup() throws SchemaChangeException {
+    List<NamedExpression> exprs = exprSpec != null ? exprSpec
+        : inferExpressions();
+    isAnyWildcard = isAnyWildcard(exprs);
+    classify = isClassificationNeeded(exprs);
+
+    for (NamedExpression namedExpression : exprs) {
+      setupExpression(namedExpression);
+    }
+  }
+
+  private List<NamedExpression> inferExpressions() {
+    List<NamedExpression> exprs = Lists.newArrayList();
+    for (MaterializedField field : incomingSchema) {
+      String fieldName = field.getName();
+      if (Types.isComplex(field.getType())
+          || Types.isRepeated(field.getType())) {
+        LogicalExpression convertToJson = FunctionCallFactory.createConvert(
+            ConvertExpression.CONVERT_TO, "JSON",
+            SchemaPath.getSimplePath(fieldName), ExpressionPosition.UNKNOWN);
+        String castFuncName = FunctionReplacementUtils
+            .getCastFunc(MinorType.VARCHAR);
+        List<LogicalExpression> castArgs = Lists.newArrayList();
+        castArgs.add(convertToJson); // input_expr
+        // Implicitly casting to varchar, since we don't know actual source
+        // length, cast to undefined length, which will preserve source length
+        castArgs.add(new ValueExpressions.LongExpression(
+            Types.MAX_VARCHAR_LENGTH, null));
+        FunctionCall castCall = new FunctionCall(castFuncName, castArgs,
+            ExpressionPosition.UNKNOWN);
+        exprs.add(new NamedExpression(castCall, new FieldReference(fieldName)));
+      } else {
+        exprs.add(new NamedExpression(SchemaPath.getSimplePath(fieldName),
+            new FieldReference(fieldName)));
+      }
+    }
+    return exprs;
+  }
+
+  private boolean isAnyWildcard(List<NamedExpression> exprs) {
+    for (NamedExpression e : exprs) {
+      if (isWildcard(e)) {
+        return true;
+      }
+    }
+    return false;
+  }
+
+  private boolean isWildcard(NamedExpression ex) {
+    if (!(ex.getExpr() instanceof SchemaPath)) {
+      return false;
+    }
+    NameSegment expr = ((SchemaPath) ex.getExpr()).getRootSegment();
+    return expr.getPath().contains(SchemaPath.DYNAMIC_STAR);
+  }
+
+  private boolean isClassificationNeeded(List<NamedExpression> exprs) {
+    boolean needed = false;
+    for (NamedExpression ex : exprs) {
+      if (!(ex.getExpr() instanceof SchemaPath)) {
+        continue;
+      }
+      NameSegment expr = ((SchemaPath) ex.getExpr()).getRootSegment();
+      NameSegment ref = ex.getRef().getRootSegment();
+      boolean refHasPrefix = ref.getPath()
+          .contains(StarColumnHelper.PREFIX_DELIMITER);
+      boolean exprContainsStar = expr.getPath()
+          .contains(SchemaPath.DYNAMIC_STAR);
+
+      if (refHasPrefix || exprContainsStar) {
+        needed = true;
+        break;
+      }
+    }
+    return needed;
+  }
+
+  private void setupExpression(NamedExpression namedExpression)
+      throws SchemaChangeException {
+    result.clear();
+    if (classify && namedExpression.getExpr() instanceof SchemaPath) {
+      classifyExpr(namedExpression, result);
+
+      if (result.isStar) {
+        setupImplicitColumnRef(namedExpression);
+        return;
+      }
+    } else {
+      // For the columns which do not needed to be classified,
+      // it is still necessary to ensure the output column name is unique
+      result.outputNames = Lists.newArrayList();
+      String outputName = getRef(namedExpression).getRootSegment().getPath(); // moved
+                                                                              // to
+                                                                              // before
+                                                                              // the
+                                                                              // if
 
 Review comment:
   maybe have all this on one line?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-570465345
 
 
   
   Longer term, it seems we might want to restructure how we generate code. Today, if we run a query across, say, 20 minor fragments in the same Drillbit, and they all see the same schema (the normal case), then all 20 will generate exactly the same code. Then, down in the compiler layer, the first thread will compile the code and put it into the code cache. Threads 2 through 20 will get the compiled code from the cache.
   
   But, today, the process is **very** inefficient. Each thread:
   
   * Does the semantic expression analysis (function lookup, type resolution, etc.)
   * Generate the entire Java code.
   * Rewrite the Java code to remove the varying bits (the generated class name).
   * Hash the entire code to get a hash code to look up in the code cache.
   * If a match is found, compare the entire code block byte-by-byte to verify a match.
   * If new, generate code, cache it, and use the source code (which can be huge) as the cache key.
   
   The only real benefit of this approach is that it has worked all these years. 
   
   The better approach is to:
   
   * Create a parameterized "descriptor" object that holds all the factors needed to generate the code. (Input schema, expressions, relevant session options.)
   * Use that descriptor as a key into the code lookup table. If found, reuse the compiled code without any code gen.
   * If not found, only then tell the descriptor to generate the needed code, which will then be shared by all fragments.
   
   The work I did back in the managed sort, and that I'm starting here, at least splits code gen from the operator.
   
   I suspect one (very long term) improvement would be to introduce another layer of abstraction like we had in Impala. The code gen code tries to do the kind of logical type analysis normally done in a planner. But, because the goal is Java code gen, it tends to mix SQL type analysis with Java implementation details, leading to overly complex code. (That's what I'm fighting with the typeof/UNION issue.).
   
   Such an approach would be doubly useful a we roll out the schema improvements your team has been doing. If we know the schema (types) at plan time, we can work out all the type conversion stuff at that time. In fact, we can even play the Spark trick: generate Java once in the planner and send it to the workers.
   
   I have only vague ideas here; have not spent much time on it. Sounds like you've looked at this some. What do you suggest we do?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362452251
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
 
 Review comment:
   ```suggestion
    * records the expression that produces the variable-width field. The expression
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-570763354
 
 
   @ihuzenko, good suggestion on `VectorState`: my original thought was to make it a static class, then move it to its own file. Then I thought better of it. You'll notice that, at present, it is kind of a "shim" class to expose certain operations on the vector objects to the "materializer" without making the entire project operator visible. In particular, it is an inner class so it can tinker with project operator state.
   
   We could accomplish the same thing by defining an interface which `ProjectRecordBatch` would implement. But, in that case, all the complex setup logic would be back in the project operator, undoing the separation we just accomplished. So, the interface approach feels not quite right.
   
   I agree that this can be improved, but I don't yet see an obvious move that leaves the code simpler. So, I thought to just leave it as is for now.
   
   Maybe it can evolve to own the allocation vectors, complex writers and so on. These are the kinds of questions we can ask now that we can start to see the pieces rather than just a big mess. Suggestions?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362496036
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
 Review comment:
   ```suggestion
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362534274
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
+
+  public static String printVV(ValueVector vv) {
+    String str = "null";
+    if (vv != null) {
+      str = vv.getField().getName() + " " + vv.getField().getType();
     }
-
-    void addFixedWidthField(ValueVector vv) {
-        assert isFixedWidth(vv);
-        fixedWidthColumnCount++;
-        int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
-        totalFixedWidthColumnWidth += fixedFieldWidth;
-        logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+    return str;
+  }
+
+  void addComplexField(ValueVector vv) {
+    //Complex types are not yet supported. Just use a guess for the size
+    assert vv == null || isComplex(vv.getField().getType());
+    complexColumnsCount++;
+    // just a guess
+    totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
+    logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+  }
+
+  void addFixedWidthField(ValueVector vv) {
+    assert isFixedWidth(vv);
+    fixedWidthColumnCount++;
+    int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
+    totalFixedWidthColumnWidth += fixedFieldWidth;
+    logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+  }
+
+  public void init(RecordBatch incomingBatch, ProjectRecordBatch outgoingBatch) {
+    setIncomingBatch(incomingBatch);
+    setOutgoingBatch(outgoingBatch);
+    reset();
+
+    RecordBatchStats.printConfiguredBatchSize(outgoingBatch.getRecordBatchStatsContext(),
+      getOutputBatchSize());
+  }
+
+  private void reset() {
+    rowWidth = 0;
+    totalFixedWidthColumnWidth = 0;
+    totalComplexColumnWidth = 0;
+
+    fixedWidthColumnCount = 0;
+    complexColumnsCount = 0;
+  }
+
+  @Override
+  public void update() {
+    long updateStartTime = System.currentTimeMillis();
+    RecordBatchSizer batchSizer = new RecordBatchSizer(incomingBatch);
+    long batchSizerEndTime = System.currentTimeMillis();
+
+    setRecordBatchSizer(batchSizer);
+    rowWidth = 0;
+    int totalVariableColumnWidth = 0;
+    for (String outputColumnName : outputColumnSizes.keySet()) {
+      ColumnWidthInfo columnWidthInfo = outputColumnSizes.get(outputColumnName);
+      int width = -1;
+      if (columnWidthInfo.isFixedWidth()) {
+        // fixed width columns are accumulated in totalFixedWidthColumnWidth
+        ShouldNotReachHere();
+      } else {
+        //Walk the tree of OutputWidthExpressions to get a FixedLenExpr
+        //As the tree is walked, the RecordBatchSizer and function annotations
+        //are looked-up to come up with the final FixedLenExpr
+        OutputWidthExpression savedWidthExpr = columnWidthInfo.getOutputExpression();
+        OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+        OutputWidthExpression reducedExpr = savedWidthExpr.accept(new OutputWidthVisitor(), state);
+        width = ((FixedLenExpr)reducedExpr).getDataWidth();
+        Preconditions.checkState(width >= 0);
+        int metadataWidth = getMetadataWidth(columnWidthInfo.outputVV);
+        logger.trace("update(): fieldName {} width: {} metadataWidth: {}",
+                columnWidthInfo.outputVV.getField().getName(), width, metadataWidth);
+        width += metadataWidth;
+      }
+      totalVariableColumnWidth += width;
     }
-
-    public void init(RecordBatch incomingBatch, ProjectRecordBatch outgoingBatch) {
-        setIncomingBatch(incomingBatch);
-        setOutgoingBatch(outgoingBatch);
-        reset();
-
-        RecordBatchStats.printConfiguredBatchSize(outgoingBatch.getRecordBatchStatsContext(),
-          getOutputBatchSize());
+    rowWidth += totalFixedWidthColumnWidth;
+    rowWidth += totalComplexColumnWidth;
+    rowWidth += totalVariableColumnWidth;
+    int outPutRowCount;
+    if (rowWidth != 0) {
+      //if rowWidth is not zero, set the output row count in the sizer
+      setOutputRowCount(getOutputBatchSize(), rowWidth);
+      // if more rows can be allowed than the incoming row count, then set the
+      // output row count to the incoming row count.
+      outPutRowCount = Math.min(getOutputRowCount(), batchSizer.rowCount());
+    } else {
+      // if rowWidth == 0 then the memory manager does
+      // not have sufficient information to size the batch
+      // let the entire batch pass through.
+      // If incoming rc == 0, all RB Sizer look-ups will have
+      // 0 width and so total width can be 0
+      outPutRowCount = incomingBatch.getRecordCount();
     }
-
-    private void reset() {
-        rowWidth = 0;
-        totalFixedWidthColumnWidth = 0;
-        totalComplexColumnWidth = 0;
-
-        fixedWidthColumnCount = 0;
-        complexColumnsCount = 0;
+    setOutputRowCount(outPutRowCount);
+    long updateEndTime = System.currentTimeMillis();
+    logger.trace("update() : Output RC {}, BatchSizer RC {}, incoming RC {}, width {}, total fixed width {}"
+                + ", total variable width {}, total complex width {}, batchSizer time {} ms, update time {}  ms"
+                + ", manager {}, incoming {}",outPutRowCount, batchSizer.rowCount(), incomingBatch.getRecordCount(),
+                rowWidth, totalFixedWidthColumnWidth, totalVariableColumnWidth, totalComplexColumnWidth,
+                (batchSizerEndTime - updateStartTime),(updateEndTime - updateStartTime), this, incomingBatch);
+
+    RecordBatchStats.logRecordBatchStats(RecordBatchIOType.INPUT, getRecordBatchSizer(), outgoingBatch.getRecordBatchStatsContext());
+    updateIncomingStats();
+  }
+
+  public static int getMetadataWidth(ValueVector vv) {
+    int width = 0;
+    if (vv instanceof NullableVector) {
+      width += ((NullableVector)vv).getBitsVector().getPayloadByteCount(1);
 
 Review comment:
   ```suggestion
         width += ((NullableVector) vv).getBitsVector().getPayloadByteCount(1);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] asfgit closed pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362534333
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
+
+  public static String printVV(ValueVector vv) {
+    String str = "null";
+    if (vv != null) {
+      str = vv.getField().getName() + " " + vv.getField().getType();
     }
-
-    void addFixedWidthField(ValueVector vv) {
-        assert isFixedWidth(vv);
-        fixedWidthColumnCount++;
-        int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
-        totalFixedWidthColumnWidth += fixedFieldWidth;
-        logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+    return str;
+  }
+
+  void addComplexField(ValueVector vv) {
+    //Complex types are not yet supported. Just use a guess for the size
+    assert vv == null || isComplex(vv.getField().getType());
+    complexColumnsCount++;
+    // just a guess
+    totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
+    logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+  }
+
+  void addFixedWidthField(ValueVector vv) {
+    assert isFixedWidth(vv);
+    fixedWidthColumnCount++;
+    int fixedFieldWidth = getNetWidthOfFixedWidthType(vv);
+    totalFixedWidthColumnWidth += fixedFieldWidth;
+    logger.trace("addFixedWidthField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
+            printVV(vv), fixedWidthColumnCount, totalFixedWidthColumnWidth);
+  }
+
+  public void init(RecordBatch incomingBatch, ProjectRecordBatch outgoingBatch) {
+    setIncomingBatch(incomingBatch);
+    setOutgoingBatch(outgoingBatch);
+    reset();
+
+    RecordBatchStats.printConfiguredBatchSize(outgoingBatch.getRecordBatchStatsContext(),
+      getOutputBatchSize());
+  }
+
+  private void reset() {
+    rowWidth = 0;
+    totalFixedWidthColumnWidth = 0;
+    totalComplexColumnWidth = 0;
+
+    fixedWidthColumnCount = 0;
+    complexColumnsCount = 0;
+  }
+
+  @Override
+  public void update() {
+    long updateStartTime = System.currentTimeMillis();
+    RecordBatchSizer batchSizer = new RecordBatchSizer(incomingBatch);
+    long batchSizerEndTime = System.currentTimeMillis();
+
+    setRecordBatchSizer(batchSizer);
+    rowWidth = 0;
+    int totalVariableColumnWidth = 0;
+    for (String outputColumnName : outputColumnSizes.keySet()) {
+      ColumnWidthInfo columnWidthInfo = outputColumnSizes.get(outputColumnName);
+      int width = -1;
+      if (columnWidthInfo.isFixedWidth()) {
+        // fixed width columns are accumulated in totalFixedWidthColumnWidth
+        ShouldNotReachHere();
+      } else {
+        //Walk the tree of OutputWidthExpressions to get a FixedLenExpr
+        //As the tree is walked, the RecordBatchSizer and function annotations
+        //are looked-up to come up with the final FixedLenExpr
+        OutputWidthExpression savedWidthExpr = columnWidthInfo.getOutputExpression();
+        OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+        OutputWidthExpression reducedExpr = savedWidthExpr.accept(new OutputWidthVisitor(), state);
+        width = ((FixedLenExpr)reducedExpr).getDataWidth();
+        Preconditions.checkState(width >= 0);
+        int metadataWidth = getMetadataWidth(columnWidthInfo.outputVV);
+        logger.trace("update(): fieldName {} width: {} metadataWidth: {}",
+                columnWidthInfo.outputVV.getField().getName(), width, metadataWidth);
+        width += metadataWidth;
+      }
+      totalVariableColumnWidth += width;
     }
-
-    public void init(RecordBatch incomingBatch, ProjectRecordBatch outgoingBatch) {
-        setIncomingBatch(incomingBatch);
-        setOutgoingBatch(outgoingBatch);
-        reset();
-
-        RecordBatchStats.printConfiguredBatchSize(outgoingBatch.getRecordBatchStatsContext(),
-          getOutputBatchSize());
+    rowWidth += totalFixedWidthColumnWidth;
+    rowWidth += totalComplexColumnWidth;
+    rowWidth += totalVariableColumnWidth;
+    int outPutRowCount;
+    if (rowWidth != 0) {
+      //if rowWidth is not zero, set the output row count in the sizer
+      setOutputRowCount(getOutputBatchSize(), rowWidth);
+      // if more rows can be allowed than the incoming row count, then set the
+      // output row count to the incoming row count.
+      outPutRowCount = Math.min(getOutputRowCount(), batchSizer.rowCount());
+    } else {
+      // if rowWidth == 0 then the memory manager does
+      // not have sufficient information to size the batch
+      // let the entire batch pass through.
+      // If incoming rc == 0, all RB Sizer look-ups will have
+      // 0 width and so total width can be 0
+      outPutRowCount = incomingBatch.getRecordCount();
     }
-
-    private void reset() {
-        rowWidth = 0;
-        totalFixedWidthColumnWidth = 0;
-        totalComplexColumnWidth = 0;
-
-        fixedWidthColumnCount = 0;
-        complexColumnsCount = 0;
+    setOutputRowCount(outPutRowCount);
+    long updateEndTime = System.currentTimeMillis();
+    logger.trace("update() : Output RC {}, BatchSizer RC {}, incoming RC {}, width {}, total fixed width {}"
+                + ", total variable width {}, total complex width {}, batchSizer time {} ms, update time {}  ms"
+                + ", manager {}, incoming {}",outPutRowCount, batchSizer.rowCount(), incomingBatch.getRecordCount(),
+                rowWidth, totalFixedWidthColumnWidth, totalVariableColumnWidth, totalComplexColumnWidth,
+                (batchSizerEndTime - updateStartTime),(updateEndTime - updateStartTime), this, incomingBatch);
+
+    RecordBatchStats.logRecordBatchStats(RecordBatchIOType.INPUT, getRecordBatchSizer(), outgoingBatch.getRecordBatchStatsContext());
+    updateIncomingStats();
+  }
+
+  public static int getMetadataWidth(ValueVector vv) {
+    int width = 0;
+    if (vv instanceof NullableVector) {
+      width += ((NullableVector)vv).getBitsVector().getPayloadByteCount(1);
     }
 
-    @Override
-    public void update() {
-        long updateStartTime = System.currentTimeMillis();
-        RecordBatchSizer batchSizer = new RecordBatchSizer(incomingBatch);
-        long batchSizerEndTime = System.currentTimeMillis();
-
-        setRecordBatchSizer(batchSizer);
-        rowWidth = 0;
-        int totalVariableColumnWidth = 0;
-        for (String outputColumnName : outputColumnSizes.keySet()) {
-            ColumnWidthInfo columnWidthInfo = outputColumnSizes.get(outputColumnName);
-            int width = -1;
-            if (columnWidthInfo.isFixedWidth()) {
-                // fixed width columns are accumulated in totalFixedWidthColumnWidth
-                ShouldNotReachHere();
-            } else {
-                //Walk the tree of OutputWidthExpressions to get a FixedLenExpr
-                //As the tree is walked, the RecordBatchSizer and function annotations
-                //are looked-up to come up with the final FixedLenExpr
-                OutputWidthExpression savedWidthExpr = columnWidthInfo.getOutputExpression();
-                OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-                OutputWidthExpression reducedExpr = savedWidthExpr.accept(new OutputWidthVisitor(), state);
-                width = ((FixedLenExpr)reducedExpr).getDataWidth();
-                Preconditions.checkState(width >= 0);
-                int metadataWidth = getMetadataWidth(columnWidthInfo.outputVV);
-                logger.trace("update(): fieldName {} width: {} metadataWidth: {}",
-                        columnWidthInfo.outputVV.getField().getName(), width, metadataWidth);
-                width += metadataWidth;
-            }
-            totalVariableColumnWidth += width;
-        }
-        rowWidth += totalFixedWidthColumnWidth;
-        rowWidth += totalComplexColumnWidth;
-        rowWidth += totalVariableColumnWidth;
-        int outPutRowCount;
-        if (rowWidth != 0) {
-            //if rowWidth is not zero, set the output row count in the sizer
-            setOutputRowCount(getOutputBatchSize(), rowWidth);
-            // if more rows can be allowed than the incoming row count, then set the
-            // output row count to the incoming row count.
-            outPutRowCount = Math.min(getOutputRowCount(), batchSizer.rowCount());
-        } else {
-            // if rowWidth == 0 then the memory manager does
-            // not have sufficient information to size the batch
-            // let the entire batch pass through.
-            // If incoming rc == 0, all RB Sizer look-ups will have
-            // 0 width and so total width can be 0
-            outPutRowCount = incomingBatch.getRecordCount();
-        }
-        setOutputRowCount(outPutRowCount);
-        long updateEndTime = System.currentTimeMillis();
-        logger.trace("update() : Output RC {}, BatchSizer RC {}, incoming RC {}, width {}, total fixed width {}"
-                    + ", total variable width {}, total complex width {}, batchSizer time {} ms, update time {}  ms"
-                    + ", manager {}, incoming {}",outPutRowCount, batchSizer.rowCount(), incomingBatch.getRecordCount(),
-                    rowWidth, totalFixedWidthColumnWidth, totalVariableColumnWidth, totalComplexColumnWidth,
-                    (batchSizerEndTime - updateStartTime),(updateEndTime - updateStartTime), this, incomingBatch);
-
-        RecordBatchStats.logRecordBatchStats(RecordBatchIOType.INPUT, getRecordBatchSizer(), outgoingBatch.getRecordBatchStatsContext());
-        updateIncomingStats();
+    if (vv instanceof VariableWidthVector) {
+      width += ((VariableWidthVector)vv).getOffsetVector().getPayloadByteCount(1);
 
 Review comment:
   ```suggestion
         width += ((VariableWidthVector) vv).getOffsetVector().getPayloadByteCount(1);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362454960
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
 
 Review comment:
   ```suggestion
     private static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362702109
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
 
 Review comment:
   Done. I tried to limit the "blast radius" by gritting my teeth and leaving this class as it was. This seems to be some offshoot of my "temporary" `BatchSizer`. Just for background, some time ago we realized that not all Drill rows are of the same width. (Who'd have guessed?) So, we added "temporary" code to work out sizes in each operator. This was supposed to be temporary until batches carried their own size info. (Once we figure it out for a vector, we don't have to throw it away and figure it out again for each operator.)
   
   In fact, I deliberately chose a goofy name "BatchSizer" to remind folks that this was a temporary hack to so I could focus on the "real work" of fixing the external sort. Sigh...
   
   But, the temporary solution seems to have become semi-permanent and has grown odd variations such as this.
   
   The "master plan" was to not try to predict batch sizes as we are doing here. Instead, the `ResultSetLoader` is intended to just let the operator write to the batch until we hit the desired limit. All the calcs are already done for the reader case. The goal was to use the same mechanism in other places were we don't know widths ahead of time. Project is the classic case: for all we know, the user is doing some silly function like repeating a big `VARCHAR` 100 times. So, if we can ever get there, all this temporary stuff will be swept away.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362702397
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
 
 Review comment:
   Thanks for looking at this in detail. Fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362704446
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
 
 Review comment:
   Nice!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362529595
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##########
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * <p>
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * <p>
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-    static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-    public RecordBatch getIncomingBatch() {
-        return incomingBatch;
+  static final Logger logger = LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map<String, ColumnWidthInfo> outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+      FIXED,
+      VARIABLE
+  }
+
+  public enum OutputColumnType {
+      TRANSFER,
+      NEW
+  }
+
+  public static class ColumnWidthInfo {
+    private final OutputWidthExpression outputExpression;
+    private final int width;
+    private final WidthType widthType;
+    private final OutputColumnType outputColumnType;
+    private final ValueVector outputVV; // for transfers, this is the transfer src
+
+
+    ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+                    OutputColumnType outputColumnType,
+                    WidthType widthType,
+                    int fieldWidth, ValueVector outputVV) {
+      this.outputExpression = outputWidthExpression;
+      this.width = fieldWidth;
+      this.outputColumnType = outputColumnType;
+      this.widthType = widthType;
+      this.outputVV = outputVV;
     }
 
-    RecordBatch incomingBatch = null;
-    ProjectRecordBatch outgoingBatch = null;
+    public OutputWidthExpression getOutputExpression() { return outputExpression; }
 
-    int rowWidth = 0;
-    Map<String, ColumnWidthInfo> outputColumnSizes;
-    // Number of variable width columns in the batch
-    int variableWidthColumnCount = 0;
-    // Number of fixed width columns in the batch
-    int fixedWidthColumnCount = 0;
-    // Number of complex columns in the batch
-    int complexColumnsCount = 0;
+    public OutputColumnType getOutputColumnType() { return outputColumnType; }
 
+    public boolean isFixedWidth() { return widthType == WidthType.FIXED; }
 
-    // Holds sum of all fixed width column widths
-    int totalFixedWidthColumnWidth = 0;
-    // Holds sum of all complex column widths
-    // Currently, this is just a guess
-    int totalComplexColumnWidth = 0;
-
-    enum WidthType {
-        FIXED,
-        VARIABLE
-    }
-
-    enum OutputColumnType {
-        TRANSFER,
-        NEW
-    }
+    public int getWidth() { return width; }
+  }
 
-    class ColumnWidthInfo {
-        OutputWidthExpression outputExpression;
-        int width;
-        WidthType widthType;
-        OutputColumnType outputColumnType;
-        ValueVector outputVV; // for transfers, this is the transfer src
+  public RecordBatch getIncomingBatch() {
+    return incomingBatch;
+  }
 
+  void ShouldNotReachHere() {
+    throw new IllegalStateException();
+  }
 
-        ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
-                        OutputColumnType outputColumnType,
-                        WidthType widthType,
-                        int fieldWidth, ValueVector outputVV) {
-            this.outputExpression = outputWidthExpression;
-            this.width = fieldWidth;
-            this.outputColumnType = outputColumnType;
-            this.widthType = widthType;
-            this.outputVV = outputVV;
-        }
+  private void setIncomingBatch(RecordBatch recordBatch) {
+    incomingBatch = recordBatch;
+  }
 
-        public OutputWidthExpression getOutputExpression() { return outputExpression; }
+  public RecordBatch incomingBatch() { return incomingBatch; }
 
-        public OutputColumnType getOutputColumnType() { return outputColumnType; }
+  private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
+    this.outgoingBatch = outgoingBatch;
+  }
 
-        boolean isFixedWidth() { return widthType == WidthType.FIXED; }
+  public ProjectMemoryManager(int configuredOutputSize) {
+    super(configuredOutputSize);
+    outputColumnSizes = new HashMap<>();
+  }
 
-        public int getWidth() { return width; }
-
-    }
-
-    void ShouldNotReachHere() {
-        throw new IllegalStateException();
-    }
-
-    private void setIncomingBatch(RecordBatch recordBatch) {
-        incomingBatch = recordBatch;
-    }
-
-    private void setOutgoingBatch(ProjectRecordBatch outgoingBatch) {
-        this.outgoingBatch = outgoingBatch;
-    }
+  public boolean isComplex(MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
+  }
 
-    public ProjectMemoryManager(int configuredOutputSize) {
-        super(configuredOutputSize);
-        outputColumnSizes = new HashMap<>();
-    }
+  boolean isFixedWidth(TypedFieldId fieldId) {
+    ValueVector vv = getOutgoingValueVector(fieldId);
+    return isFixedWidth(vv);
+  }
 
-    public boolean isComplex(MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        return minorType == MinorType.MAP || minorType == MinorType.UNION || minorType == MinorType.LIST;
-    }
+  public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
+    Class<?> clazz = fieldId.getIntermediateClass();
+    int[] fieldIds = fieldId.getFieldIds();
+    return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
+  }
 
-    boolean isFixedWidth(TypedFieldId fieldId) {
-        ValueVector vv = getOutgoingValueVector(fieldId);
-        return isFixedWidth(vv);
-    }
+  static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
 
-    public ValueVector getOutgoingValueVector(TypedFieldId fieldId) {
-        Class<?> clazz = fieldId.getIntermediateClass();
-        int[] fieldIds = fieldId.getFieldIds();
-        return outgoingBatch.getValueAccessorById(clazz, fieldIds).getValueVector();
-    }
 
-    static boolean isFixedWidth(ValueVector vv) {  return (vv instanceof FixedWidthVector); }
+  static int getNetWidthOfFixedWidthType(ValueVector vv) {
+    assert isFixedWidth(vv);
+    return ((FixedWidthVector)vv).getValueWidth();
+  }
 
+  public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
+    MinorType minorType = majorType.getMinorType();
+    final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
+            || minorType == MinorType.VARBINARY);
 
-    static int getNetWidthOfFixedWidthType(ValueVector vv) {
-        assert isFixedWidth(vv);
-        return ((FixedWidthVector)vv).getValueWidth();
+    if (isVariableWidth) {
+      throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
     }
 
-    public static int getDataWidthOfFixedWidthType(TypeProtos.MajorType majorType) {
-        MinorType minorType = majorType.getMinorType();
-        final boolean isVariableWidth  = (minorType == MinorType.VARCHAR || minorType == MinorType.VAR16CHAR
-                || minorType == MinorType.VARBINARY);
-
-        if (isVariableWidth) {
-            throw new IllegalArgumentException("getWidthOfFixedWidthType() cannot handle variable width types");
-        }
-
-        if (minorType == MinorType.NULL) {
-            return 0;
-        }
-
-        return TypeHelper.getSize(majorType);
+    if (minorType == MinorType.NULL) {
+      return 0;
     }
 
+    return TypeHelper.getSize(majorType);
+  }
 
-    void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
-        addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
-    }
 
-    void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
-        addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
-    }
+  void addTransferField(ValueVector vvIn, String inputColumnName, String outputColumnName) {
+    addField(vvIn, null, OutputColumnType.TRANSFER, inputColumnName, outputColumnName);
+  }
 
-    void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
-                  String inputColumnName, String outputColumnName) {
-        if(isFixedWidth(vv)) {
-            addFixedWidthField(vv);
-        } else {
-            addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
-        }
-    }
+  void addNewField(ValueVector vvOut, LogicalExpression logicalExpression) {
+    addField(vvOut, logicalExpression, OutputColumnType.NEW, null, vvOut.getField().getName());
+  }
 
-    private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
-                                       OutputColumnType outputColumnType, String inputColumnName, String outputColumnName) {
-        variableWidthColumnCount++;
-        ColumnWidthInfo columnWidthInfo;
-        logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
-                printVV(vv), variableWidthColumnCount, outputColumnType);
-        //Variable width transfers
-        if(outputColumnType == OutputColumnType.TRANSFER) {
-            VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
-            columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
-        } else if (isComplex(vv.getField().getType())) {
-            addComplexField(vv);
-            return;
-        } else {
-            // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
-            OutputWidthVisitorState state = new OutputWidthVisitorState(this);
-            OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
-            columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
-                    WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
-        }
-        ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
-        Preconditions.checkState(existingInfo == null);
+  void addField(ValueVector vv, LogicalExpression logicalExpression, OutputColumnType outputColumnType,
+                String inputColumnName, String outputColumnName) {
+    if(isFixedWidth(vv)) {
+      addFixedWidthField(vv);
+    } else {
+      addVariableWidthField(vv, logicalExpression, outputColumnType, inputColumnName, outputColumnName);
     }
-
-    public static String printVV(ValueVector vv) {
-        String str = "null";
-        if (vv != null) {
-            str = vv.getField().getName() + " " + vv.getField().getType();
-        }
-        return str;
+  }
+
+  private void addVariableWidthField(ValueVector vv, LogicalExpression logicalExpression,
+                                     OutputColumnType outputColumnType, String inputColumnName,
+                                     String outputColumnName) {
+    variableWidthColumnCount++;
+    ColumnWidthInfo columnWidthInfo;
+    logger.trace("addVariableWidthField(): vv {} totalCount: {} outputColumnType: {}",
+            printVV(vv), variableWidthColumnCount, outputColumnType);
+    // Variable width transfers
+    if (outputColumnType == OutputColumnType.TRANSFER) {
+      VarLenReadExpr readExpr = new VarLenReadExpr(inputColumnName);
+      columnWidthInfo = new ColumnWidthInfo(readExpr, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the RecordBatchSizer
+    } else if (isComplex(vv.getField().getType())) {
+      addComplexField(vv);
+      return;
+    } else {
+      // Walk the tree of LogicalExpressions to get a tree of OutputWidthExpressions
+      OutputWidthVisitorState state = new OutputWidthVisitorState(this);
+      OutputWidthExpression outputWidthExpression = logicalExpression.accept(new OutputWidthVisitor(), state);
+      columnWidthInfo = new ColumnWidthInfo(outputWidthExpression, outputColumnType,
+              WidthType.VARIABLE, -1, vv); //fieldWidth has to be obtained from the OutputWidthExpression
     }
-
-    void addComplexField(ValueVector vv) {
-        //Complex types are not yet supported. Just use a guess for the size
-        assert vv == null || isComplex(vv.getField().getType());
-        complexColumnsCount++;
-        // just a guess
-        totalComplexColumnWidth +=  OutputSizeEstimateConstants.COMPLEX_FIELD_ESTIMATE;
-        logger.trace("addComplexField(): vv {} totalCount: {} totalComplexColumnWidth: {}",
-                printVV(vv), complexColumnsCount, totalComplexColumnWidth);
+    ColumnWidthInfo existingInfo = outputColumnSizes.put(outputColumnName, columnWidthInfo);
+    Preconditions.checkState(existingInfo == null);
+  }
+
+  public static String printVV(ValueVector vv) {
 
 Review comment:
   ```suggestion
     private static String printVV(ValueVector vv) {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services