You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by paul-rogers <gi...@git.apache.org> on 2017/01/01 21:09:26 UTC

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/716

    DRILL-5116: Enable generated code debugging in each Drill operator

    DRILL-5052 added the ability to debug generated code. The reviewer suggested permitting the technique to be used for all Drill operators. This PR provides the required fixes. Most were small changes, others dealt with the rather clever way that the existing byte-code merge converted static nested classes to non-static inner classes, with the way that constructors were inserted at the byte-code level and so on. See the JIRA for the details.
    
    This code passed the unit tests twice: once with the traditional byte-code manipulations, a second time using "plain-old Java" code compilation. Plain-old Java is turned off by default, but can be turned on for all operators with a single config change: see the JIRA for info. Consider the plain-old Java option to be experimental: very handy for debugging, perhaps not quite tested enough for production use.
    
    One or two operators were not converted as they appear to not be used at present, so no tests exercise them. See the JIRA for details.
    
    This fix required visiting all operators. All generated methods declare exceptions. For plain-old Java, the based class methods must also declare exceptions. That afforded the opportunity to fix compiler warnings about resources (our over-use of {{AutoCloseable}}), missing type parameters, unused variables, and so on.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-5116

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #716
    
----
commit 020e75cc5a538a0271dc78b361622b64ca6c2c65
Author: Paul Rogers <pr...@maprtech.com>
Date:   2016-12-13T01:30:56Z

    DRILL-5116: Enable generated code debugging in each Drill operator
    
    DRILL-5052 added the ability to debug generated code. The reviewer suggested
    permitting the technique to be used for all Drill operators. This PR provides
    the required fixes. Most were small changes, others dealt with the rather
    clever way that the existing byte-code merge converted static nested classes
    to non-static inner classes, with the way that constructors were inserted
    at the byte-code level and so on. See the JIRA for the details.
    
    This code passed the unit tests twice: once with the traditional byte-code
    manipulations, a second time using "plain-old Java" code compilation.
    Plain-old Java is turned off by default, but can be turned on for all
    operators with a single config change: see the JIRA for info. Consider
    the plain-old Java option to be experimental: very handy for debugging,
    perhaps not quite tested enough for production use.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by Ben-Zvi <gi...@git.apache.org>.
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94709919
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -167,24 +167,61 @@ drill.exec: {
         debug: true,
         janino_maxsize: 262144,
         cache_max_size: 1000,
    -    // Enable to write generated source to disk. See ClassBuilder
    -    save_source: false,
         // Where to save the generated source. See ClassBuilder
         code_dir: "/tmp/drill/codegen"
    +    // Disable code cache. Only for testing.
    +    disable_cache: false,
    +    // Use plain Java compilation where available
    +    prefer_plain_java: false
       },
       sort: {
         purge.threshold : 1000,
         external: {
    -      batch.size : 4000,
    +      // Drill uses the managed External Sort Batch by default.
    --- End diff --
    
    The following two changes are not related to DRILL-5116.  Should they be moved to another PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95048162
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,36 +67,33 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
      * classes not marked as plain-old Java capable.
      */
     
     public class ClassBuilder {
     
    -  public static final String SAVE_CODE_OPTION = CodeCompiler.COMPILE_BASE + ".save_source";
       public static final String CODE_DIR_OPTION = CodeCompiler.COMPILE_BASE + ".code_dir";
     
       private final DrillConfig config;
       private final OptionManager options;
    -  private final boolean saveCode;
       private final File codeDir;
     
       public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         this.config = config;
         options = optionManager;
     
    -    // The option to save code is a boot-time option because
    -    // it is used selectively during debugging, but can cause
    -    // excessive I/O in a running server if used to save all code.
    +    // Code can be saved per-class to enable debugging.
    +    // Just mark the code generator as to be persisted,
    +    // point your debugger to the directory set below, and you
    --- End diff --
    
    Since the code generator uses a sequence number as part of the class name (ProjectGen5.java etc), is it possible that in different run we may end up with different class name? Will it cause issue when debug? If that's the case, probably document such possiblity so that people is aware of. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95469122
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -141,7 +142,9 @@ public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         // A key advantage of this method is that the code can be
         // saved and debugged, if needed.
     
    -    saveCode(code, name);
    +    if (cg.persistCode()) {
    --- End diff --
    
    Method names changed to `saveCodeForDebugging` and `isCodeToBeSaved`. More of  a mouthful, but hopefully this will be clearer -- also clearer that code is saved only for the purpose of debugging (not, for example, as a code cache that spans Drillbit runs, etc.) I'd kind of hoped the Javadoc would explain all that...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95053577
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -167,10 +167,12 @@ drill.exec: {
         debug: true,
         janino_maxsize: 262144,
         cache_max_size: 1000,
    -    // Enable to write generated source to disk. See ClassBuilder
    -    save_source: false,
         // Where to save the generated source. See ClassBuilder
         code_dir: "/tmp/drill/codegen"
    --- End diff --
    
    Using Eclipse, the /tmp/drill/codegen directory is very simple: Eclipse prompts for the source location on the first run, remembers it thereafter, and MacOS cleans up any left-over generated files on the next restart.
    
    Looks like things are a bit more difficult in IntelliJ. I'll play around a bit to see if I can find a solution that works well for both tools. I'll create a separate JIRA and PR for that work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94877010
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassCompilerSelector.java ---
    @@ -119,13 +119,18 @@ public ClassCompilerSelector(ClassLoader classLoader, DrillConfig config, Option
         byte[][] bc = getCompiler(sourceCode).getClassByteCode(className, sourceCode);
     
         // Uncomment the following to save the generated byte codes.
    -
    -//    final String baseDir = System.getProperty("java.io.tmpdir") + File.separator + className;
    -//    File classFile = new File(baseDir + className.clazz);
    -//    classFile.getParentFile().mkdirs();
    -//    try (BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(classFile))) {
    -//      out.write(bc[0]);
    +    // Use the JDK javap command to view the generated code.
    +    // This is the code from the compiler before byte code manipulations.
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by Ben-Zvi <gi...@git.apache.org>.
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94697483
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/QueryClassLoader.java ---
    @@ -58,6 +58,19 @@ public void injectByteCode(String className, byte[] classBytes) throws IOExcepti
           throw new IOException(String.format("The class defined %s has already been loaded.", className));
         }
         customClasses.put(className, classBytes);
    +
    --- End diff --
    
    Maybe add a comment that for the "before manipulations BC" see ClassCompilerSelector.java (line 122) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95046588
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -167,10 +167,12 @@ drill.exec: {
         debug: true,
         janino_maxsize: 262144,
         cache_max_size: 1000,
    -    // Enable to write generated source to disk. See ClassBuilder
    -    save_source: false,
         // Where to save the generated source. See ClassBuilder
         code_dir: "/tmp/drill/codegen"
    --- End diff --
    
    Writing to an external directory may require to add such source directory to the IDE project each time when people try to debug the generated code. 
    
    One way I tried is to put the generated source under $drill-java-exec/target/generated-sources, which is currently for freemarker's generated source.  That way, Intellij is able to load these new generated class automatically when you debug. 
    
    The above approach requires us to exclude those runtime generated class from the distribution jars, by adding the following in java-exec pom.xml. 
    
    ```
                      <excludes>
                        <exclude>**/test/generated/**</exclude>
                      </excludes>
    ``` 
    
    The above comment is optional.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95053721
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -141,7 +142,9 @@ public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         // A key advantage of this method is that the code can be
         // saved and debugged, if needed.
     
    -    saveCode(code, name);
    +    if (cg.persistCode()) {
    --- End diff --
    
    It is the method to ask that the generated code be saved... Perhaps the confusing bit is that there are two methods of the same name: `persistCode(boolean)` which is called to ask to persist the code, and `persistCode()` which is used to check if the code should be persisted. (Somehow, `isPersistCode` or `isCodePersisted` just didn't seem as clear...)
    
    On the next PR, I'll see if I can come up with a clearer name. At that time, I'll find a way to test the two "dead" operators so that all support plain Java. Once that is done, I can remove the calls to `supportsPlainJava` and comments about using `persistCode` that now appear in each operator to mark those that are plain-Java ready.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95459877
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -41,12 +39,87 @@
      */
     
     public class CodeCompiler {
    +  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CodeCompiler.class);
    +
    +  /**
    +   * Abstracts out the details of compiling code using the two available
    +   * mechanisms. Allows this mechanism to be unit tested separately from
    +   * the code cache.
    +   */
    +
    +  public static class CodeGenCompiler {
    +    private final ClassTransformer transformer;
    +    private final ClassBuilder classBuilder;
    +
    +    public CodeGenCompiler(final DrillConfig config, final OptionManager optionManager) {
    +      transformer = new ClassTransformer(config, optionManager);
    +      classBuilder = new ClassBuilder(config, optionManager);
    +    }
    +
    +    /**
    +     * Compile the code already generated by the code generator.
    +     *
    +     * @param cg the code generator for the class
    +     * @return the compiled class
    +     * @throws Exception if anything goes wrong
    +     */
    +
    +    public Class<?> compile(final CodeGenerator<?> cg) throws Exception {
    +       if (cg.isPlainOldJava()) {
    --- End diff --
    
    Since we now have to modes for runtime code generation across almost all Drill operators (plain-old vs bytecode transformer), is there a way to check which mode is used for each operator in a query? It would be nice to include such information in log, to help analyze a potential issue (whether there is a bug in the bytecode transformer, or plain-old style class). 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95039960
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/compile/TestClassTransformation.java ---
    @@ -44,6 +44,9 @@
     
       @BeforeClass
       public static void beforeTestClassTransformation() throws Exception {
    +    // Tests here require the byte-code merge technique and are meaningless
    +    // if the plain-old Java technique is selected.
    +    System.setProperty("drill.compile.prefer_plain_java", "false");
    --- End diff --
    
    should this be drill.exec.compile.prefer_plain_java  ("exec" is missing). 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95053687
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,36 +67,33 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
      * classes not marked as plain-old Java capable.
      */
     
     public class ClassBuilder {
     
    -  public static final String SAVE_CODE_OPTION = CodeCompiler.COMPILE_BASE + ".save_source";
       public static final String CODE_DIR_OPTION = CodeCompiler.COMPILE_BASE + ".code_dir";
     
       private final DrillConfig config;
       private final OptionManager options;
    -  private final boolean saveCode;
       private final File codeDir;
     
       public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         this.config = config;
         options = optionManager;
     
    -    // The option to save code is a boot-time option because
    -    // it is used selectively during debugging, but can cause
    -    // excessive I/O in a running server if used to save all code.
    +    // Code can be saved per-class to enable debugging.
    +    // Just mark the code generator as to be persisted,
    +    // point your debugger to the directory set below, and you
    --- End diff --
    
    It has not been an issue with Eclipse. I wonder if it will be for IntelliJ? Eclipse cares about the generated sources only when we step into them. Otherwise, Eclipse completely ignores the zillions of old generated files that might be lying around. And, since we generate a new copy of the file each time we generate it, the new copy will replace the old one. Even if Eclipse had the old file open from a previous session, it will replace the old one with the new one once Eclipse switches to that window.
    
    Again, I need to look into IntelliJ to see if a more elaborate solution is needed for that tool.
    
    I like your suggestion to document this. Once all this is committed, I'll submit a JIRA to add documentation about how to use this, along with the IntelliJ issues, to the Developer Information page on the Drill web site.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #716: DRILL-5116: Enable generated code debugging in each Drill ...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on the issue:

    https://github.com/apache/drill/pull/716
  
    In the current code generator, we log both the source code and the size of source / byte code, the time spent on compiling, and the complier (Janino vs JDK) at debug level. It's probably debatable to log the source code, but the other information is quite useful for analyzing customer issue. In the plain-old style generator, seems we do not put those information in debug log. For now, the patch is not enabled. But in the future, if we are going to use this new code generator in production, we had better log more information( either in the log, or in operator profile), as that will help improve service workload. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/716


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by Ben-Zvi <gi...@git.apache.org>.
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94708618
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,14 +67,12 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
    --- End diff --
    
    Just a future comment: When eventually the "plain old java" method takes over and the byte code manipulation code is eliminated, then the meaning of "plain old java" won't be clear. We should start thinking about a better term then.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94879598
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -167,24 +167,61 @@ drill.exec: {
         debug: true,
         janino_maxsize: 262144,
         cache_max_size: 1000,
    -    // Enable to write generated source to disk. See ClassBuilder
    -    save_source: false,
         // Where to save the generated source. See ClassBuilder
         code_dir: "/tmp/drill/codegen"
    +    // Disable code cache. Only for testing.
    +    disable_cache: false,
    +    // Use plain Java compilation where available
    +    prefer_plain_java: false
       },
       sort: {
         purge.threshold : 1000,
         external: {
    -      batch.size : 4000,
    +      // Drill uses the managed External Sort Batch by default.
    --- End diff --
    
    Fixed. Leaked in from the original development branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #716: DRILL-5116: Enable generated code debugging in each Drill ...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/716
  
    Looks good. Thank you, Paul!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94879149
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java ---
    @@ -374,6 +388,124 @@ public HoldingContainer declare(MajorType t, boolean includeNewInstance) {
         return this.workspaceVectors;
       }
     
    +  /**
    +   * Prepare the generated class for use as a plain-old Java class
    +   * (to be compiled by a compiler and directly loaded without a
    +   * byte-code merge. Three additions are necessary:
    +   * <ul>
    +   * <li>The class must extend its template as we won't merge byte
    +   * codes.</li>
    +   * <li>A constructor is required to call the <tt>__DRILL_INIT__</tt>
    +   * method. If this is a nested class, then the constructor must
    +   * include parameters defined by the base class.</li>
    +   * <li>For each nested class, create a method that creates an
    +   * instance of that nested class using a well-defined name. This
    +   * method overrides the base class method defined for this purpose.</li>
    +   */
    +
    +  public void preparePlainJava() {
    +
    +    // If this generated class uses the "straight Java" technique
    +    // (no byte code manipulation), then the class must extend the
    +    // template so it plays by normal Java rules for finding the
    +    // template methods via inheritance rather than via code injection.
    +
    +    Class<?> baseClass = sig.getSignatureClass();
    +    clazz._extends(baseClass);
    +
    +    // Create a constuctor for the class: either a default one,
    +    // or (for nested classes) one that passes along arguments to
    +    // the super class constructor.
    +
    +    Constructor<?>[] ctors = baseClass.getConstructors();
    +    for (Constructor<?> ctor : ctors) {
    +      addCtor(ctor.getParameterTypes());
    +    }
    +
    +    // Some classes have no declared constructor, but we need to generate one
    +    // anyway.
    +
    +    if ( ctors.length == 0 ) {
    +      addCtor( new Class<?>[] {} );
    +    }
    +
    +    // Repeat for inner classes.
    +
    +    for(ClassGenerator<T> child : innerClasses.values()) {
    +      child.preparePlainJava();
    +
    +      // If there are inner classes, then we need to generate a "shim" method
    +      // to instantiate that class.
    +      //
    +      // protected TemplateClass.TemplateInnerClass newTemplateInnerClass( args... ) {
    +      //    return new GeneratedClass.GeneratedInnerClass( args... );
    +      // }
    +      //
    +      // The name is special, it is "new" + inner class name. The template must
    +      // provide a method of this name that creates the inner class instance.
    +
    +      String innerClassName = child.clazz.name();
    +      JMethod shim = clazz.method(JMod.PROTECTED, child.sig.getSignatureClass(), "new" + innerClassName);
    +      JInvocation childNew = JExpr._new(child.clazz);
    +      Constructor<?>[] childCtors = child.sig.getSignatureClass().getConstructors();
    +      Class<?>[] params;
    +      if (childCtors.length==0) {
    +        params = new Class<?>[0];
    +      } else {
    +        params = childCtors[0].getParameterTypes();
    +      }
    +      for (int i = 1; i < params.length; i++) {
    +        Class<?> p = params[i];
    +        childNew.arg(shim.param(model._ref(p), "arg" + i));
    +      }
    +      shim.body()._return(childNew);
    +    }
    +  }
    +
    +  /**
    +   * The code generator creates a method called __DRILL_INIT__ which takes the
    +   * place of the constructor when the code goes though the byte code merge.
    +   * For Plain-old Java, we call the method from a constructor created for
    +   * that purpose. (Generated code, fortunately, never includes a constructor,
    +   * so we can create one.) Since the init block throws an exception (which
    +   * should never occur), the generated constructor converts the checked
    +   * exception into an unchecked one so as to not require changes to the
    +   * various places that create instances of the generated classes.
    +   *
    +   * Example:<code><pre>
    +   * public StreamingAggregatorGen1() {
    +   *       try {
    +   *         __DRILL_INIT__();
    +   *     } catch (SchemaChangeException e) {
    +   *         throw new UnsupportedOperationException(e);
    +   *     }
    +   * }</pre></code>
    +   *
    +   * Note: in Java 8 we'd use the <tt>Parameter</tt> class defined in Java's
    +   * introspection package. But, Drill prefers Java 7 which only provides
    +   * parameter types.
    +   */
    +
    +  private void addCtor(Class<?>[] parameters) {
    +    JMethod ctor = clazz.constructor(JMod.PUBLIC);
    +    JInvocation superCall = null;
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by Ben-Zvi <gi...@git.apache.org>.
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94520899
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -82,18 +83,18 @@
     
       private final DrillConfig config;
       private final OptionManager options;
    -  private final boolean saveCode;
       private final File codeDir;
     
       public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         this.config = config;
         options = optionManager;
     
    -    // The option to save code is a boot-time option because
    -    // it is used selectively during debugging, but can cause
    -    // excessive I/O in a running server if used to save all code.
    +    // Code can be saved per-class to enable debugging.
    +    // Just mark the code generator as to be persistented,
    +    // point your debugger to the directory set below, and you
    +    // can step into the code for debugging. Code is not saved
    +    // be default because doing so is expensive and unnecessary.
     
    -    saveCode = config.getBoolean(SAVE_CODE_OPTION);
    --- End diff --
    
    Should also delete line 81 above -- the definition of SAVE_CODE_OPTION 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by Ben-Zvi <gi...@git.apache.org>.
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94697603
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassCompilerSelector.java ---
    @@ -119,13 +119,18 @@ public ClassCompilerSelector(ClassLoader classLoader, DrillConfig config, Option
         byte[][] bc = getCompiler(sourceCode).getClassByteCode(className, sourceCode);
     
         // Uncomment the following to save the generated byte codes.
    -
    -//    final String baseDir = System.getProperty("java.io.tmpdir") + File.separator + className;
    -//    File classFile = new File(baseDir + className.clazz);
    -//    classFile.getParentFile().mkdirs();
    -//    try (BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(classFile))) {
    -//      out.write(bc[0]);
    +    // Use the JDK javap command to view the generated code.
    +    // This is the code from the compiler before byte code manipulations.
    --- End diff --
    
    Maybe add a comment that for the "after BC manipulations" see similar code in  QueryClassLoader.java .



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95472935
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestConvertFunctions.java ---
    @@ -76,6 +75,22 @@
     
       String textFileContent;
     
    +  @BeforeClass
    +  public static void setup( ) {
    +    // Tests here rely on the byte-code merge approach to code
    +    // generation and will fail if using plain-old Java.
    +    // Actually, some queries succeed with plain-old Java that
    +    // fail with scalar replacement, but the tests check for the
    +    // scalar replacement failure and, not finding it, fail the
    +    // test.
    +    //
    +    // The setting here forces byte-code merge even if the
    +    // config file asks for plain-old Java.
    +    // TODO: Fix the tests to handle both cases.
    +
    +    System.setProperty("drill.compile.prefer_plain_java", "false");
    --- End diff --
    
    Fixed as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95048372
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -141,7 +142,9 @@ public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         // A key advantage of this method is that the code can be
         // saved and debugged, if needed.
     
    -    saveCode(code, name);
    +    if (cg.persistCode()) {
    --- End diff --
    
    I found "persistCode()" a bit confusing. From the method name, I thought it's the method to persist the code to directory, but looks like it's the method to check if we want to persistCode; the actual one is saveCode(). 
    
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95047940
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,36 +67,33 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    --- End diff --
    
    Do all drill operators support POJ class with this patch (the comment said "most")? If not, please list the one not supported in the comment. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95471607
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -41,12 +39,87 @@
      */
     
     public class CodeCompiler {
    +  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CodeCompiler.class);
    +
    +  /**
    +   * Abstracts out the details of compiling code using the two available
    +   * mechanisms. Allows this mechanism to be unit tested separately from
    +   * the code cache.
    +   */
    +
    +  public static class CodeGenCompiler {
    +    private final ClassTransformer transformer;
    +    private final ClassBuilder classBuilder;
    +
    +    public CodeGenCompiler(final DrillConfig config, final OptionManager optionManager) {
    +      transformer = new ClassTransformer(config, optionManager);
    +      classBuilder = new ClassBuilder(config, optionManager);
    +    }
    +
    +    /**
    +     * Compile the code already generated by the code generator.
    +     *
    +     * @param cg the code generator for the class
    +     * @return the compiled class
    +     * @throws Exception if anything goes wrong
    +     */
    +
    +    public Class<?> compile(final CodeGenerator<?> cg) throws Exception {
    +       if (cg.isPlainOldJava()) {
    --- End diff --
    
    Done. Trace logging will identify if the class was found in the code cache, or compile via plain-Java or byte code manipulation. Also logged the compiler selection and debug code options (which are set statically). This should provide most information needed to understand a run.
    
    Also added the requested information about a plain-Java compilation: the byte code size and elapsed time. Revised both log messages to include class name to provide some context for the compilation messages.
    
    The missing information is whether a particular class was compiled with Janino or JDK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95053626
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,36 +67,33 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
    --- End diff --
    
    See above & the JIRA entry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94876735
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -153,4 +242,17 @@ public GeneratedClassEntry(final Class<?> clazz) {
       public void flushCache() {
         cache.invalidateAll();
       }
    +
    +  /**
    +   * Upon close, report the effectiveness of the code cache to the log.
    +   */
    +
    +  public void close() {
    +    logger.info("Class gen count: " + classGenCount);
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94874974
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,14 +67,12 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
    --- End diff --
    
    True. "Plain old Java" here is a reference to the commonly-used "POJO" in Java: "Plain Old Java Object." Could have called it "Plain Old Java Class"...
    
    Yes, if/when we switch to this technique by default then, by definition, all operators must be able to generate such code and we won't need the method that declares that support (nor the code that chooses the compilation path.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94875162
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -82,18 +83,18 @@
     
       private final DrillConfig config;
       private final OptionManager options;
    -  private final boolean saveCode;
       private final File codeDir;
     
       public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         this.config = config;
         options = optionManager;
     
    -    // The option to save code is a boot-time option because
    -    // it is used selectively during debugging, but can cause
    -    // excessive I/O in a running server if used to save all code.
    +    // Code can be saved per-class to enable debugging.
    +    // Just mark the code generator as to be persistented,
    +    // point your debugger to the directory set below, and you
    +    // can step into the code for debugging. Code is not saved
    +    // be default because doing so is expensive and unnecessary.
     
    -    saveCode = config.getBoolean(SAVE_CODE_OPTION);
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95047989
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,36 +67,33 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
    --- End diff --
    
    It would be good if we know what "any remaining" are exactly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95490516
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,36 +67,33 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
      * classes not marked as plain-old Java capable.
      */
     
     public class ClassBuilder {
     
    -  public static final String SAVE_CODE_OPTION = CodeCompiler.COMPILE_BASE + ".save_source";
       public static final String CODE_DIR_OPTION = CodeCompiler.COMPILE_BASE + ".code_dir";
     
       private final DrillConfig config;
       private final OptionManager options;
    -  private final boolean saveCode;
       private final File codeDir;
     
       public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         this.config = config;
         options = optionManager;
     
    -    // The option to save code is a boot-time option because
    -    // it is used selectively during debugging, but can cause
    -    // excessive I/O in a running server if used to save all code.
    +    // Code can be saved per-class to enable debugging.
    +    // Just mark the code generator as to be persisted,
    +    // point your debugger to the directory set below, and you
    --- End diff --
    
    It's probably fine to leave as it is now with proper document usage. If more people uses this feature and run into the class name difference issue, we can do that at that time. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95442239
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -141,7 +142,9 @@ public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         // A key advantage of this method is that the code can be
         // saved and debugged, if needed.
     
    -    saveCode(code, name);
    +    if (cg.persistCode()) {
    --- End diff --
    
    Sounds to me that `persistCode(boolean)` is to set the persist boolean flag in code generator if it's allowed, while `persistCode()` is to check if the boolean flag is turned on or not. In such case, can we call the former as `setPersistMode(boolean)`, and call the later as `isPersistable`? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by Ben-Zvi <gi...@git.apache.org>.
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94695372
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -117,18 +200,24 @@ public CodeCompiler(final DrillConfig config, final OptionManager optionManager)
       private class Loader extends CacheLoader<CodeGenerator<?>, GeneratedClassEntry> {
         @Override
         public GeneratedClassEntry load(final CodeGenerator<?> cg) throws Exception {
    -      final Class<?> c;
    -      if ( cg.isPlainOldJava( ) ) {
    -        // Generate class as plain old Java
    +      return makeClass(cg);
    +    }
    +  }
     
    -        c = classBuilder.getImplementationClass(cg);
    -      } else {
    -        // Generate class parts and assemble byte-codes.
    +  /**
    +   * Called when the requested class does not exist in the class and should
    --- End diff --
    
    "in the cache", not "in the class" 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95461369
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,36 +67,33 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
      * classes not marked as plain-old Java capable.
      */
     
     public class ClassBuilder {
     
    -  public static final String SAVE_CODE_OPTION = CodeCompiler.COMPILE_BASE + ".save_source";
       public static final String CODE_DIR_OPTION = CodeCompiler.COMPILE_BASE + ".code_dir";
     
       private final DrillConfig config;
       private final OptionManager options;
    -  private final boolean saveCode;
       private final File codeDir;
     
       public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         this.config = config;
         options = optionManager;
     
    -    // The option to save code is a boot-time option because
    -    // it is used selectively during debugging, but can cause
    -    // excessive I/O in a running server if used to save all code.
    +    // Code can be saved per-class to enable debugging.
    +    // Just mark the code generator as to be persisted,
    +    // point your debugger to the directory set below, and you
    --- End diff --
    
    What we talked about probably is slightly different. What you mean is to step into the generated code, without setting a breaking point in it.  What I mean is first run the query and have the code generated, then I setup some breakpoint in the code generated, re-run the query and start debugs. In such cases, we need make sure the two runs of query get the same class names, otherwise the breakpoint will not be effective. 
      
    Either way, it would be good to doc the typical ways of using the generated code in debug mode. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95472493
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/compile/TestClassTransformation.java ---
    @@ -44,6 +44,9 @@
     
       @BeforeClass
       public static void beforeTestClassTransformation() throws Exception {
    +    // Tests here require the byte-code merge technique and are meaningless
    +    // if the plain-old Java technique is selected.
    +    System.setProperty("drill.compile.prefer_plain_java", "false");
    --- End diff --
    
    You are right. Replaced the name with the constant defined in `CodeCompiler` as I should have done in the first place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94877069
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/QueryClassLoader.java ---
    @@ -58,6 +58,19 @@ public void injectByteCode(String className, byte[] classBytes) throws IOExcepti
           throw new IOException(String.format("The class defined %s has already been loaded.", className));
         }
         customClasses.put(className, classBytes);
    +
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by Ben-Zvi <gi...@git.apache.org>.
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94696407
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -153,4 +242,17 @@ public GeneratedClassEntry(final Class<?> clazz) {
       public void flushCache() {
         cache.invalidateAll();
       }
    +
    +  /**
    +   * Upon close, report the effectiveness of the code cache to the log.
    +   */
    +
    +  public void close() {
    +    logger.info("Class gen count: " + classGenCount);
    --- End diff --
    
    A very minor suggestion: Put all three loggings on a single log line (for sake of making the log more compact; indeed this is not the main offender ....)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #716: DRILL-5116: Enable generated code debugging in each Drill ...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on the issue:

    https://github.com/apache/drill/pull/716
  
    +1
    
    The patch looks good to me. Thanks for the PR, @paul-rogers !
    
    I'll squash and re-run the regression, before merge it to master branch. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95040108
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestConvertFunctions.java ---
    @@ -76,6 +75,22 @@
     
       String textFileContent;
     
    +  @BeforeClass
    +  public static void setup( ) {
    +    // Tests here rely on the byte-code merge approach to code
    +    // generation and will fail if using plain-old Java.
    +    // Actually, some queries succeed with plain-old Java that
    +    // fail with scalar replacement, but the tests check for the
    +    // scalar replacement failure and, not finding it, fail the
    +    // test.
    +    //
    +    // The setting here forces byte-code merge even if the
    +    // config file asks for plain-old Java.
    +    // TODO: Fix the tests to handle both cases.
    +
    +    System.setProperty("drill.compile.prefer_plain_java", "false");
    --- End diff --
    
    Same as above comment. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by Ben-Zvi <gi...@git.apache.org>.
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94706789
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java ---
    @@ -374,6 +388,124 @@ public HoldingContainer declare(MajorType t, boolean includeNewInstance) {
         return this.workspaceVectors;
       }
     
    +  /**
    +   * Prepare the generated class for use as a plain-old Java class
    +   * (to be compiled by a compiler and directly loaded without a
    +   * byte-code merge. Three additions are necessary:
    +   * <ul>
    +   * <li>The class must extend its template as we won't merge byte
    +   * codes.</li>
    +   * <li>A constructor is required to call the <tt>__DRILL_INIT__</tt>
    +   * method. If this is a nested class, then the constructor must
    +   * include parameters defined by the base class.</li>
    +   * <li>For each nested class, create a method that creates an
    +   * instance of that nested class using a well-defined name. This
    +   * method overrides the base class method defined for this purpose.</li>
    +   */
    +
    +  public void preparePlainJava() {
    +
    +    // If this generated class uses the "straight Java" technique
    +    // (no byte code manipulation), then the class must extend the
    +    // template so it plays by normal Java rules for finding the
    +    // template methods via inheritance rather than via code injection.
    +
    +    Class<?> baseClass = sig.getSignatureClass();
    +    clazz._extends(baseClass);
    +
    +    // Create a constuctor for the class: either a default one,
    +    // or (for nested classes) one that passes along arguments to
    +    // the super class constructor.
    +
    +    Constructor<?>[] ctors = baseClass.getConstructors();
    +    for (Constructor<?> ctor : ctors) {
    +      addCtor(ctor.getParameterTypes());
    +    }
    +
    +    // Some classes have no declared constructor, but we need to generate one
    +    // anyway.
    +
    +    if ( ctors.length == 0 ) {
    +      addCtor( new Class<?>[] {} );
    +    }
    +
    +    // Repeat for inner classes.
    +
    +    for(ClassGenerator<T> child : innerClasses.values()) {
    +      child.preparePlainJava();
    +
    +      // If there are inner classes, then we need to generate a "shim" method
    +      // to instantiate that class.
    +      //
    +      // protected TemplateClass.TemplateInnerClass newTemplateInnerClass( args... ) {
    +      //    return new GeneratedClass.GeneratedInnerClass( args... );
    +      // }
    +      //
    +      // The name is special, it is "new" + inner class name. The template must
    +      // provide a method of this name that creates the inner class instance.
    +
    +      String innerClassName = child.clazz.name();
    +      JMethod shim = clazz.method(JMod.PROTECTED, child.sig.getSignatureClass(), "new" + innerClassName);
    +      JInvocation childNew = JExpr._new(child.clazz);
    +      Constructor<?>[] childCtors = child.sig.getSignatureClass().getConstructors();
    +      Class<?>[] params;
    +      if (childCtors.length==0) {
    +        params = new Class<?>[0];
    +      } else {
    +        params = childCtors[0].getParameterTypes();
    +      }
    +      for (int i = 1; i < params.length; i++) {
    +        Class<?> p = params[i];
    +        childNew.arg(shim.param(model._ref(p), "arg" + i));
    +      }
    +      shim.body()._return(childNew);
    +    }
    +  }
    +
    +  /**
    +   * The code generator creates a method called __DRILL_INIT__ which takes the
    +   * place of the constructor when the code goes though the byte code merge.
    +   * For Plain-old Java, we call the method from a constructor created for
    +   * that purpose. (Generated code, fortunately, never includes a constructor,
    +   * so we can create one.) Since the init block throws an exception (which
    +   * should never occur), the generated constructor converts the checked
    +   * exception into an unchecked one so as to not require changes to the
    +   * various places that create instances of the generated classes.
    +   *
    +   * Example:<code><pre>
    +   * public StreamingAggregatorGen1() {
    +   *       try {
    +   *         __DRILL_INIT__();
    +   *     } catch (SchemaChangeException e) {
    +   *         throw new UnsupportedOperationException(e);
    +   *     }
    +   * }</pre></code>
    +   *
    +   * Note: in Java 8 we'd use the <tt>Parameter</tt> class defined in Java's
    +   * introspection package. But, Drill prefers Java 7 which only provides
    +   * parameter types.
    +   */
    +
    +  private void addCtor(Class<?>[] parameters) {
    +    JMethod ctor = clazz.constructor(JMod.PUBLIC);
    +    JInvocation superCall = null;
    --- End diff --
    
    A minor suggestion: Write all the code between lines 491-502 inclusive inside a single if statement:
    ```
      // if there are parameters, need to pass them to the super class   
      if ( parameters.length > 0 ) {   
        JInvocation superCall = JExpr.invoke("super");
        ......   
      }   
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95053621
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,36 +67,33 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    --- End diff --
    
    Hard to put that list anywhere but in the operators that have the issues (else we have to keep the comments in sync.) There are two. The old `SortBatch`. Also `OrderedPartitionRecordBatch`, which seems to not be used at present. These are called out in the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95442410
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/compile/TestClassTransformation.java ---
    @@ -44,6 +44,9 @@
     
       @BeforeClass
       public static void beforeTestClassTransformation() throws Exception {
    +    // Tests here require the byte-code merge technique and are meaningless
    +    // if the plain-old Java technique is selected.
    +    System.setProperty("drill.compile.prefer_plain_java", "false");
    --- End diff --
    
    Any comment regarding this one? I thought the name of the options has a typo?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r94875353
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -117,18 +200,24 @@ public CodeCompiler(final DrillConfig config, final OptionManager optionManager)
       private class Loader extends CacheLoader<CodeGenerator<?>, GeneratedClassEntry> {
         @Override
         public GeneratedClassEntry load(final CodeGenerator<?> cg) throws Exception {
    -      final Class<?> c;
    -      if ( cg.isPlainOldJava( ) ) {
    -        // Generate class as plain old Java
    +      return makeClass(cg);
    +    }
    +  }
     
    -        c = classBuilder.getImplementationClass(cg);
    -      } else {
    -        // Generate class parts and assemble byte-codes.
    +  /**
    +   * Called when the requested class does not exist in the class and should
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95479214
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,14 +67,12 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
    --- End diff --
    
    Since I had to rename other method; went ahead and changed references to "plain-old Java" to just "plain Java."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #716: DRILL-5116: Enable generated code debugging in each...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/716#discussion_r95468443
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/ClassBuilder.java ---
    @@ -64,36 +67,33 @@
      * local variables. Have fun!</li>
      * </ul>
      * <p>
    - * Note: not all generated code is ready to be compiled as plain-old
    - * Java. Some classes omit from the template the proper <code>throws</code>
    - * declarations. Other minor problems may also crop up. All are easy
    - * to fix. Once you've done so, add the following to mark that you've
    - * done the clean-up:<pre>
    - * cg.plainOldJavaCapable(true); // Class supports plain-old Java</pre>
    + * Most generated classes have been upgraded to support Plain-old Java
    + * compilation. Once this work is complete, the calls to
    + * <tt>plainOldJavaCapable<tt> can be removed as all generated classes
    + * will be capable.
      * <p>
    - * The setting to prefer plain-old Java is ignored for generated
    + * The setting to prefer plain-old Java is ignored for any remaining generated
      * classes not marked as plain-old Java capable.
      */
     
     public class ClassBuilder {
     
    -  public static final String SAVE_CODE_OPTION = CodeCompiler.COMPILE_BASE + ".save_source";
       public static final String CODE_DIR_OPTION = CodeCompiler.COMPILE_BASE + ".code_dir";
     
       private final DrillConfig config;
       private final OptionManager options;
    -  private final boolean saveCode;
       private final File codeDir;
     
       public ClassBuilder(DrillConfig config, OptionManager optionManager) {
         this.config = config;
         options = optionManager;
     
    -    // The option to save code is a boot-time option because
    -    // it is used selectively during debugging, but can cause
    -    // excessive I/O in a running server if used to save all code.
    +    // Code can be saved per-class to enable debugging.
    +    // Just mark the code generator as to be persisted,
    +    // point your debugger to the directory set below, and you
    --- End diff --
    
    Thanks. The key thing to remember here is that the generated code does not exist until the code generator runs, so it is somewhat hard to set a breakpoint ahead of time given the current structure.
    
    You are right that we can *add* another feature that causes the code to reuse class names so that we can set breakpoints inside the generated code. I worked around this in my sort performance stuff by creating my own copy of the code which I then hand modified to try out various changes. That was a quick way to avoid the name issue and to try changes without having to muck with the code generator just to do a prototype.
    
    Do you feel we need to add the consistent-class-name feature in this PR, or can that be a refinement in a later PR?
    
    And, I agree we should document usage as a way to encourage folks to try it and help us figure out what else we can improve.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---