You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Edward Capriolo (JIRA)" <ji...@apache.org> on 2013/10/02 17:16:43 UTC

[jira] [Commented] (HIVE-5423) Speed up testing of scalar UDFS

    [ https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784056#comment-13784056 ] 

Edward Capriolo commented on HIVE-5423:
---------------------------------------

It is wasteful  that we launch end-to-end jobs to test every scalar UDF. I came across a TestOperators class which does testing at the operator level. I created a new class like this:

{code}
public class TestSimpleExecDriver extends TestCase {

  public static ExprNodeColumnDesc getStringColumn(String columnName) {
    return new ExprNodeColumnDesc(TypeInfoFactory.stringTypeInfo, columnName, "", false);
  }


  public void testConcatUdf() throws Throwable {
    long start = System.currentTimeMillis();
    DataBuilder db = new DataBuilder();
    db.setColumnNames("a", "b", "c");
    db.setColumnTypes(
        PrimitiveObjectInspectorFactory.javaStringObjectInspector,
        PrimitiveObjectInspectorFactory.javaStringObjectInspector,
        PrimitiveObjectInspectorFactory.javaStringObjectInspector);
    db.addRow("one", "two", "three");
    db.addRow("four", "two", "three");
    db.addRow( null, "two", "three");
    InspectableObject[] r = db.createRows();

    ExprNodeDesc expr1 = getStringColumn("a");
    ExprNodeDesc expr2 = getStringColumn("b");
    ExprNodeDesc exprDesc2 = TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("concat", expr1, expr2);
    ArrayList<ExprNodeDesc> earr = new ArrayList<ExprNodeDesc>();
    earr.add(expr1);
    earr.add(exprDesc2);
    ArrayList<String> outputCols = new ArrayList<String>();
    for (int i = 0; i < earr.size(); i++) {
      outputCols.add("_col" + i);
    }
    SelectDesc selectCtx = new SelectDesc(earr, outputCols);
    Operator<SelectDesc> op = OperatorFactory.get(SelectDesc.class);
    op.setConf(selectCtx);

    CollectDesc cd = new CollectDesc(Integer.valueOf(10));
    CollectOperator cdop = (CollectOperator) OperatorFactory.getAndMakeChild(cd, op);

    op.initialize(new JobConf(TestSimpleExecDriver.class), new ObjectInspector[] {r[0].oi});
    for (int i = 0; i < r.length; i++) {
      op.process(r[i].o, 0);
    }
    op.close(false);

    InspectableObject io = new InspectableObject();

    cdop.retrieve(io);
    assertEquals("one", interogatePrimitiveObjectFromIo("_col0", io));
    assertEquals("onetwo", interogatePrimitiveObjectFromIo("_col1", io));

    cdop.retrieve(io);
    assertEquals("four", interogatePrimitiveObjectFromIo("_col0", io));
    assertEquals("fourtwo", interogatePrimitiveObjectFromIo("_col1", io));

    cdop.retrieve(io);
    assertEquals(null, interogatePrimitiveObjectFromIo("_col0", io));
    assertEquals(null, interogatePrimitiveObjectFromIo("_col1", io));
    System.out.println("took "+ (System.currentTimeMillis() - start)  );
  }

  private Object interogatePrimitiveObjectFromIo(String column, InspectableObject io){
    StructObjectInspector soi = (StructObjectInspector) io.oi;
    StructField wantedField = soi.getStructFieldRef(column);
    return ((PrimitiveObjectInspector) wantedField.getFieldObjectInspector()).getPrimitiveJavaObject(soi
        .getStructFieldData(io.o, wantedField));
  }

}
{code}

This has many advantages
1) fast, half second fast
2) no qtests
3) no qtest gen
4) runs with no external properties or mumbo jumbo
5) easier to establish code coverage

I do not see anything this style of testing is missing, can any of the other devs think of something?

If no one sees a problem testing this way I propose we convert all scalar udf .q test into code like this. We will probably save hours or testing time, and make it easier for people to add udfs to hive.

> Speed up testing of scalar UDFS
> -------------------------------
>
>                 Key: HIVE-5423
>                 URL: https://issues.apache.org/jira/browse/HIVE-5423
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)