You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/04/01 07:02:42 UTC

[GitHub] [drill] paul-rogers commented on pull request #2183: DRILL-6547: IllegalStateException: Tried to remove unmanaged buffer in concat function

paul-rogers commented on pull request #2183:
URL: https://github.com/apache/drill/pull/2183#issuecomment-811694253


   @oleg-zinovev, sorry for the delay. For some reason, Github does not notify me of new comments, even though I'm a "watcher." Odd.
   
   Thanks also for the explanation and generated code. I agree with your analysis. If we ever do a "revised edition" of the Drill book, we should explain this detail.
   
   I wonder: is this the only remaining function that has this problem? When I quickly glanced through the UDFs, it seemed some used the pattern like the one you are using here, others use the "wrong" pattern. Should we fix all of them to avoid having to fix them one by one?
   
   Also, the reason that I said that one of the VARCHAR fields would probably be null is because of the name of the function:
   
   ``` java
     public static class ConcatRightNullInput implements DrillSimpleFunc {
   ```
   
   We seem to also have:
   
   ``` java
     public static class ConcatLeftNullInput implements DrillSimpleFunc {
   ```
   
   Which, by the way, has the same bug you are fixing here.
   
   There is also:
   
   ``` java
     public static class ConcatBothNullInput implements DrillSimpleFunc {
   ```
   
   For which, presumably, both inputs are NULL (zero length), but we still allocate a buffer, and that code has the bug you are fixing. Of course, if both inputs are NULL, then there length is undefined (NULL says to ignore the value; we cannot assume that the string value is the empty string or that the start/end offsets are valid.)
   
   Finally, we also have:
   
   ``` java
     public static class ConcatOperator implements DrillSimpleFunc {
   ```
   
   Which does use both inputs, and which already has the fix you propose here.
   
   This is what I meant: we have multiple variations of the same function, some are fixed, some are still broken. The variations claim that one or both sides are NULL, but we are still working with the (possibly invalid) VARCHARs in those cases, when we should not.
   
   Now, because things are not complex enough, we have another issue. If either of the arguments of `CONCAT(left, right)` are the SQL NULL value, then the output should be... the non-null argument? Or, NULL? At least according to [SQL Server](https://docs.microsoft.com/en-us/sql/t-sql/functions/concat-transact-sql?view=sql-server-ver15), the rule is that a NULL value is treated as an empty string. We should thus modify the three `Null` variations to do that: use an empty string, *not* the value of the null argument. That is, for `ConcatRightNullInput`, return the left input. For `ConcatLeftNullInput`, return the right input. For `ConcatBothNullInput`, return an empty (zero-length) string.
   
   I hope this clarifies things a bit!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org