You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/03/05 01:24:28 UTC

[GitHub] [drill] paul-rogers edited a comment on pull request #2183: DRILL-6547: IllegalStateException: Tried to remove unmanaged buffer in concat function

paul-rogers edited a comment on pull request #2183:
URL: https://github.com/apache/drill/pull/2183#issuecomment-791072405


   Thanks for fixing this.
   
   I suspect the problem here runs deeper than this particular fix. This is Drill: nothing is simple, things build on hacks which build on undocumented complexities. I suspect that is the issue here.
   
   In particular, we need to know the rules for buffer ownership in UDFs which is, of course, not documented. The only reasonable way to learn this is to run a query, save the generated Java code and inspect the calls. Warning: the generated code is a huge mess.
   
   The key question is this: when you declare an `@Inject` variable (`buffer` in your case), does the runtime provide an initial buffer? My guess is that it does in the form of a "managed" buffer. (This just means that there is some auto-free mechanism.) I believe that the buffer is allocated on each call, but the generated code will say for sure. (The UDF code is copied into the generated code.)
   
   The function declares an `@Output`, in this case `out`. If this variable provides a buffer, it would seem that ownership has to pass to the caller, since the caller will copy that buffer into a value vector. My guess is that if the buffer is "managed", it will be freed automatically at some point.
   
   (Note that managed buffers can cause an OOM because of when they are freed: I think it is during fragment shutdown, IIRC. I had to fix a reader bug related to this issue.)
   
   To summarize, it appears that the proper pattern in UDFs is:
   
   * Declare an `@Inject` buffer which the runtime will provide.
   * Resize the buffer if needed, using the return value from `reallocIfNeeded`.
   * Use that buffer as the output buffer by setting it to `out.buffer` where needed.
   * Managed buffers are freed automatically, at some unknown point.
   
   If this is true, there is no reason to update `buffer`, the proper pattern is shown in [DecimalAggrTypeFunctions1](https://github.com/apache/drill/blob/2eb6979206ad1b6f830a070dffa822edbfac6189/exec/java-exec/src/main/codegen/templates/Decimal/DecimalAggrTypeFunctions1.java#L113) (with my own comments):
   
   ``` java
       // Inject means that the runtime will provide a default-size buffer on each call
       @Inject DrillBuf buffer;
       // Holder for output. NOT initialized b the runtime.
       @Output ${type.outputType}Holder out;
       ...
           // Resize the provided buffer, replacing if necessary, and assigning that
           // to the output variable.
           out.buffer = buffer.reallocIfNeeded(len);
           // Using the now-valid output buffer, copy in any needed data.
           out.buffer.setBytes(0, bytes);
           // Upon exit, is the output buffer is now owned by caller? Must be, or the output
           // data would be immediately freed if the caller were to release the injected
           // buffer, and all hell would break loose.
   ```
   
   As usual, there is no documentation of this, you just gotta know (or figure it out every time.)
   
   The buffer management happens in `BufferManagerImpl`, as shown in the JIRA stack trace, which is called by `reallocIfNeeded`:
   
   ```
     public DrillBuf replace(DrillBuf old, int newSize) {
       if (managedBuffers.remove(old.memoryAddress()) == null) {
         throw new IllegalStateException("Tried to remove unmanaged buffer.");
       }
       old.release(1);
       return getManagedBuffer(newSize);
     }
   ```
   
   This is saying, "Hey, that buffer you gave me? It ain't one of mine." The question is why. The theory in this bug is that we forgot to replace our working buffer. Let's check that by doing a [search for `reallocIfNeeded`](https://github.com/apache/drill/search?p=2&q=reallocIfNeeded). We find many cases where we don't save the reallocated buffer back into the `@Inject` variable. For [example](https://github.com/apache/drill/blob/2eb6979206ad1b6f830a070dffa822edbfac6189/exec/java-exec/src/main/codegen/templates/Decimal/DecimalAggrTypeFunctions1.java#L113):
   
   ``` java
           out.buffer = buffer.reallocIfNeeded(len);
           out.buffer.setBytes(0, bytes);
   ```
   
   I *suspect*, but am not sure without running the code, that every call to the UDF is issued a new managed `buffer`. So, I suspect that the problem will only occur if, within a single function call, we:
   
   * Call `reallocIfNeeded` and get a new buffer.
   * Call `reallocIfNeeded` on the original (now unregistered) buffer, but only if the previous call actually resized the buffer.
   
   Looking at the code in this function, there are no loops, so there is no ability to make the above mistake. Yet, many other functions in the file changed here have the exact same code pattern introduced in this fix. Why does the code work so often if many other functions *do not* follow the pattern adopted here?
   
   There are two possibilities:
   
   * The generated code has a bug: it is not providing a new buffer on each function invocation.
   * The problem is not where we think it is: it is elsewhere. (The stack trace in the JIRA ticket shows generated code, so we can't be sure which bit of code is actually making the call.)
   
   In particular, I'm not convinced that the unit test reproduces the error. The unit test is different from the query in the JIRA ticket. A good practice is to run the proposed test *before* any fix to make sure we can reproduce the issue.
   
   The proposed test (simplified):
   
   ``` sql
   select concat(cast(null as varchar), EXPR$0) as c1
               "from (values('foo'), ('bar'))
   ```
   
   User's query (also simplified):
   
   ``` sql
   select concat(BinaryValue, 'aaa') NewVarcharValue from ...;
   ```
   
   What is different? It may not be obvious, but in the first query, we have a `NULL` const and a column (value vector, which happens to be defined by a `VALUES` clause with a constant). In the second, we have a column value and a string constant. That string constant is not stored in a value vector, it is in a temporary buffer (IIRC).
   
   My hunch is that the problem is with the constant: it is causing something to break. This is why so many of the functions in `StringFunctions.java` have code like the proposed fix: they are also likely tested with constants. If the problem is with constants, then it means that all other UDFs that work with strings, but don't have the proposed fix, will fail when used with constants.
   
   That is, sounds like we got us a bit of a problem; this fix is just the tip of the iceberg.
   
   How should we proceed? We can do just this fix, and fix other issues as people encounter and report them. Or, we can figure out what is *really* going on by looking at the generated code. To do that, modify the [Project operator](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L279):
   
   ``` java
       boolean saveCode = true; \\ change from false in the master branch
   ```
   
   Set the variable to `true` as shown above (it will be `false` by default.) Run a query like the user's. Code will be generated. By default I think it is in `/tmp/drill/codegen/...`. Analyze what's going on with the `@Inject` buffer.
   
   Pay particular attention to how constants are handled. What is causing us to try to reallocate a non-managed buffer in this case?
   
   As it turns out, with this trick, you can actually *debug* the generated code. A bit after the above line is one that generates the file. Set a breakpoint after that line. Then, go hunt down the generated file. Set a breakpoint somewhere. Let the code run. It will stop at your breakpoint in the generated code. This is easy in Eclipse. IIRC, IntelliJ users have to do some trick to put the code in a path that IntelliJ likes. Jinfeng may have left a note somewhere from how he made it work many years ago.
   
   Then, we can decide how to proceed:
   
   * If the fix here is the correct one, make it in all UDFs that have the wrong pattern. (See the search above, there are **many** such occurrences.) Add a comment to the `reallocIfNeeded` method to explain the proper usage. Provide a code snippet to show what should be done.
   * If the problem is with constant handling, work out what should be done. Fix it. Revert the `out.buffer = buffer = buffer.ralloc...` bits of code in `StringFunctions.java` and retest to ensure constants work correctly.
   
   BTW, in researching these issues, I found a number of other problems.
   
   If the proposed fix is correct, then a particularly nasty buf is [here](https://github.com/apache/drill/blob/2eb6979206ad1b6f830a070dffa822edbfac6189/exec/vector/src/main/java/org/apache/drill/exec/vector/ValueHolderHelper.java#L121) where the new buffer is created internally and never returned.
   
   The function we're trying to fix, `ConcatLeftNullInput`, has a bug:
   
   ``` java
   public static class ConcatLeftNullInput implements DrillSimpleFunc {
       ...
   
       @Override
       public void eval() {
         out.buffer = buffer.reallocIfNeeded( (left.end - left.start) + (right.end - right.start));
   ```
   
   The function is `...LeftNullInput` and, yet, we are asking for the `left.end` and `left.start`. By definition these are zero. Why? Because the left is `NULL`. In fact, if the left value is `NULL`, I'm not sure the `start` and `end` values are even defined to be any particular value. (Amazing this stuff works as well as it does.)
   
   In the same file, this code is *just plain wrong*:
   
   ``` java
   public static class Replace implements DrillSimpleFunc {
       ...
       @Inject DrillBuf buffer;
       @Output VarCharHolder out;
   
       @Override
       public void setup() {
         buffer = buffer.reallocIfNeeded(8000);
       }
   ```
   
   See the bug? What if the new string is larger than 8K? Kaboom. What if the string is 10 characters? Huge waste of memory. The above is what we call "crude and not quite effective", it works most of the time, except when it doesn't.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org