You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Tanujit Ghosh <ta...@gmail.com> on 2013/08/24 13:35:17 UTC

Implementation of the upper() function

Hi All,

I put together a simple implementation of the upper() string function.

Few questions

1. Is this the correct way, just wanted to understand to understand the
ByteBuf and whether we should be doing the setBytes with the string.
Another obvious optimization that can consider the byte as number and then
intrepret to ascii value and if it falls int he lower change change the
value to higher value.
Just wanted to know what is the way that would be better.

2. It would be helpful to get an explainatin on the bytebuf and how its
wrapped around in the vector and holder objects and how this is supposed to
be shared.


  @FunctionTemplate(name = "uppercase", scope =
FunctionTemplate.FunctionScope.SIMPLE,
              nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
  public static class UpperString implements DrillFunc {

    @Param  VarCharHolder input;
    @Output VarCharHolder out;

    public void setup(RecordBatch b){}

    public void eval(){
      out.buffer = input.buffer;
      out.start = input.start;
      out.end = input.end;

      String inputStr = input.toString().toUpperCase();
      out.buffer.setBytes(input.start, inputStr.getBytes());
    }

    public static class Provider implements CallProvider {
      @Override
      public FunctionDefinition[] getFunctionDefintions() {
        return new FunctionDefinition[] {
          FunctionDefinition.simple("uppercase",
            new BasicArgumentValidator(new
Arg(Types.required(TypeProtos.MinorType.VARCHAR),

 Types.required(TypeProtos.MinorType.VAR16CHAR))),
            new OutputTypeDeterminer.SameAsFirstInput(),
            "upper")
        };
      }
    }
  }


Thanks and Regards,
Tanujit

Re: Implementation of the upper() function

Posted by Ben Becker <bb...@maprtech.com>.
Hi Tanujit,

Responses in-line:

On Sat, Aug 24, 2013 at 4:35 AM, Tanujit Ghosh <ta...@gmail.com>wrote:

> Hi All,
>
> I put together a simple implementation of the upper() string function.
>
> Few questions
>
> 1. Is this the correct way, just wanted to understand to understand the
> ByteBuf and whether we should be doing the setBytes with the string.
> Another obvious optimization that can consider the byte as number and then
> intrepret to ascii value and if it falls int he lower change change the
> value to higher value.
> Just wanted to know what is the way that would be better.
>

The supplied code appears correct.  Note that the current default encoding
for Drill is UTF-8, while the Java String is UTF-16.  Ideally, the upper()
function would operate directly on the UTF-8 encoded ByteBufs and avoid
allocating a temporary string.  But this seems just fine for now.


> 2. It would be helpful to get an explainatin on the bytebuf and how its
> wrapped around in the vector and holder objects and how this is supposed to
> be shared.
>

A *ByteBuf* is a pooled, reference-counted resource (we actually use the *
PooledByteBuf*).
A *ValueVector* reference a ByteBuf, which stores a vector of values for a
single field.
A *VectorHolder* can be thought of as a reference to a single value in the
ValueVector.
When a ValueVector is 'done' with the ByteBuf, the buffer's reference count
is decremented and can be reclaimed in the pool.  See
BaseDataValueVector.clear().


>
>
>   @FunctionTemplate(name = "uppercase", scope =
> FunctionTemplate.FunctionScope.SIMPLE,
>               nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
>   public static class UpperString implements DrillFunc {
>
>     @Param  VarCharHolder input;
>     @Output VarCharHolder out;
>
>     public void setup(RecordBatch b){}
>
>     public void eval(){
>       out.buffer = input.buffer;
>       out.start = input.start;
>       out.end = input.end;
>
>       String inputStr = input.toString().toUpperCase();
>       out.buffer.setBytes(input.start, inputStr.getBytes());
>     }
>
>     public static class Provider implements CallProvider {
>       @Override
>       public FunctionDefinition[] getFunctionDefintions() {
>         return new FunctionDefinition[] {
>           FunctionDefinition.simple("uppercase",
>             new BasicArgumentValidator(new
> Arg(Types.required(TypeProtos.MinorType.VARCHAR),
>
>  Types.required(TypeProtos.MinorType.VAR16CHAR))),
>

I'm not sure you need to worry about VAR16CHAR for now, but you may want to
allow Types.optional(VARCHAR).  The NULL_IF_NULL declaration will handle
null values automatically.


>             new OutputTypeDeterminer.SameAsFirstInput(),
>             "upper")
>         };
>       }
>     }
>   }
>
>
> Thanks and Regards,
> Tanujit
>


Regards,
-Ben