You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/07/13 00:38:20 UTC
[jira] [Updated] (DRILL-4776) Errata, questions about UDF documentation

     [ https://issues.apache.org/jira/browse/DRILL-4776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-4776:
-------------------------------
    Description: 
See the documentation at https://drill.apache.org/docs/develop-custom-functions-introduction/

"Simple Function: A simple function operates on a single row and produces a single row as the output."

Some explanation is needed. In SQL, the function accepts a single column and produces a new column as output: SELECT myFunc( x ) FROM y; The example string and math functions are, in fact, column (technically "scalar") functions.

Process, item 3: Explain why Drill needs the source files.

On this page: https://drill.apache.org/docs/developing-a-simple-function/

Step 1 has a Maven dependency on version 1.1.0 of Drill. It is probably obvious to most folks, but the user must replace the 1.1.0 with the version of Drill that is running on their cluster.

Step 3: it is not clear if the "bit holders" are parameters to a function or are member variables into which values are injected. Some more background about the runtime flow would help answer this question. That is, what does Drill do with the class? When is an instance created? How are values passed in?

Step 4: are setup( ) and eval( ) overrides? If so, add the standard @Override annotation to help the user understand that these are overrides. Otherwise, these might be "magic method names" (like "main"), so the user has to know to use exactly those names (and signatures).

Step 4: explain the purpose of the setup( ) method. When is it called? Once per Drillbit session? Once per fragment? Once per row? How do we intend it to be used? (This method is described on the aggregatess page, perhaps just say, "the setup method is described later in this tutorial.")

Step 5: "Verify that an empty drill-module.conf is included in the resources folder." This is after the compile step. But, that file won't exist unless the user adds it to their source tree. Should we include such a step, say after step 1? Also, note that in Step 2 of the previous page, we say that the file must contain "drill.classpath.scanning.packages += "com.yourgroupidentifier.udf"". Which is right?

Step 5: "add it to etc/drill/conf." This seems to be a vestige of an un-documented feature that looks for drill configuration in that path. The more typical place is $DRILL_HOME/conf. (Or, in Drill 1.8, $DRILL_SITE.) But, note, only ONE file of the name drill-module.conf can exist. Since the file is intended to have module-specific config, the config dir is an awkard place. Overall, this is probably just plain wrong.

On this page: https://drill.apache.org/docs/tutorial-develop-a-simple-function/

Step 1: Has same problem with naming old version of Drill. Update it to 1.7.0, or simply say to include the user's own Drill version.

Step 3: We omit the import declaration for the Param annotation. Is it org.apache.drill.exec.expr.annotations.Param?

Step 4: As above, we need the import. org...Output? Also, Inject.

For above, when introducing a new annotation, explain that the annotations are described later in the tutorial.

Also, we should provide a link to the Javadoc for the classes and annotations described here. (Javadoc is the only way that a developer has to figure out the actual uses.) If we don't have such Javadoc on Apache, we should add it.

Step 5: Again, when is setup( ) called? Once per Drillbit? Once per query? Once per method call?

Step 5: The code is a bit strange. This is more than a doc issue; the whole code gen thing needs thinking about. It will be very difficult to debug the function if part of it is code generated only in the Drill server...

On this page: https://drill.apache.org/docs/adding-custom-functions-to-drill/

Change the following for Drill 1.8 or later:

Step 1, "copy them to <drill installation directory>/jars/3rdparty." Change to "copy them to $DRILL_SITE/jars".

(The above change moves user jars out of the Drill distribution directory, making upgrades much simpler.)

  was:
See the documentation at https://drill.apache.org/docs/develop-custom-functions-introduction/

"Simple Function: A simple function operates on a single row and produces a single row as the output."

Some explanation is needed. In SQL, the function accepts a single column and produces a new column as output: SELECT myFunc( x ) FROM y; The example string and math functions are, in fact, column (technically "scalar") functions.

Process, item 3: Explain why Drill needs the source files.

On this page: https://drill.apache.org/docs/developing-a-simple-function/

Step 1 has a Maven dependency on version 1.1.0 of Drill. It is probably obvious to most folks, but the user must replace the 1.1.0 with the version of Drill that is running on their cluster.

Step 3: it is not clear if the "bit holders" are parameters to a function or are member variables into which values are injected. Some more background about the runtime flow would help answer this question. That is, what does Drill do with the class? When is an instance created? How are values passed in?

Step 4: are setup( ) and eval( ) overrides? If so, add the standard @Override annotation to help the user understand that these are overrides. Otherwise, these might be "magic method names" (like "main"), so the user has to know to use exactly those names (and signatures).

Step 4: explain the purpose of the setup( ) method. When is it called? Once per Drillbit session? Once per fragment? Once per row? How do we intend it to be used?





> Errata, questions about UDF documentation
> -----------------------------------------
>
>                 Key: DRILL-4776
>                 URL: https://issues.apache.org/jira/browse/DRILL-4776
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 1.7.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> See the documentation at https://drill.apache.org/docs/develop-custom-functions-introduction/
> "Simple Function: A simple function operates on a single row and produces a single row as the output."
> Some explanation is needed. In SQL, the function accepts a single column and produces a new column as output: SELECT myFunc( x ) FROM y; The example string and math functions are, in fact, column (technically "scalar") functions.
> Process, item 3: Explain why Drill needs the source files.
> On this page: https://drill.apache.org/docs/developing-a-simple-function/
> Step 1 has a Maven dependency on version 1.1.0 of Drill. It is probably obvious to most folks, but the user must replace the 1.1.0 with the version of Drill that is running on their cluster.
> Step 3: it is not clear if the "bit holders" are parameters to a function or are member variables into which values are injected. Some more background about the runtime flow would help answer this question. That is, what does Drill do with the class? When is an instance created? How are values passed in?
> Step 4: are setup( ) and eval( ) overrides? If so, add the standard @Override annotation to help the user understand that these are overrides. Otherwise, these might be "magic method names" (like "main"), so the user has to know to use exactly those names (and signatures).
> Step 4: explain the purpose of the setup( ) method. When is it called? Once per Drillbit session? Once per fragment? Once per row? How do we intend it to be used? (This method is described on the aggregatess page, perhaps just say, "the setup method is described later in this tutorial.")
> Step 5: "Verify that an empty drill-module.conf is included in the resources folder." This is after the compile step. But, that file won't exist unless the user adds it to their source tree. Should we include such a step, say after step 1? Also, note that in Step 2 of the previous page, we say that the file must contain "drill.classpath.scanning.packages += "com.yourgroupidentifier.udf"". Which is right?
> Step 5: "add it to etc/drill/conf." This seems to be a vestige of an un-documented feature that looks for drill configuration in that path. The more typical place is $DRILL_HOME/conf. (Or, in Drill 1.8, $DRILL_SITE.) But, note, only ONE file of the name drill-module.conf can exist. Since the file is intended to have module-specific config, the config dir is an awkard place. Overall, this is probably just plain wrong.
> On this page: https://drill.apache.org/docs/tutorial-develop-a-simple-function/
> Step 1: Has same problem with naming old version of Drill. Update it to 1.7.0, or simply say to include the user's own Drill version.
> Step 3: We omit the import declaration for the Param annotation. Is it org.apache.drill.exec.expr.annotations.Param?
> Step 4: As above, we need the import. org...Output? Also, Inject.
> For above, when introducing a new annotation, explain that the annotations are described later in the tutorial.
> Also, we should provide a link to the Javadoc for the classes and annotations described here. (Javadoc is the only way that a developer has to figure out the actual uses.) If we don't have such Javadoc on Apache, we should add it.
> Step 5: Again, when is setup( ) called? Once per Drillbit? Once per query? Once per method call?
> Step 5: The code is a bit strange. This is more than a doc issue; the whole code gen thing needs thinking about. It will be very difficult to debug the function if part of it is code generated only in the Drill server...
> On this page: https://drill.apache.org/docs/adding-custom-functions-to-drill/
> Change the following for Drill 1.8 or later:
> Step 1, "copy them to <drill installation directory>/jars/3rdparty." Change to "copy them to $DRILL_SITE/jars".
> (The above change moves user jars out of the Drill distribution directory, making upgrades much simpler.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)