You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Skye Wanderman-Milne <sk...@cloudera.com> on 2016/03/23 17:03:22 UTC

Re: got memory limit exceeded when use UDA

+user@impala.incubator.apache.org

The reason you're running out of memory is that you're overwriting dst's
old buffer every time you allocate and copy a new buffer, so you lose the
pointers to the old buffers and can never free them.

What are you trying to do with this UDA? The idea is that you get the same
'dst' across many CollectSetUpdate/CollectSetMerge calls, and you should
update 'dst' in each call, not overwrite it. See our built-in aggregate
functions for examples, like the Avg function:
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exprs/aggregate-functions.cc#L237

On Wed, Mar 23, 2016 at 12:08 AM, Jerry <44...@qq.com> wrote:

> hello, all.I have written a UDA to get any one element of a cloumn but got
>  memory limit exceeded .
> Here is the code.
> void CollectSetInit(FunctionContext* context, StringVal* val) {
>     val->is_null = true;
> }
> void CollectSetUpdate(FunctionContext* context, const StringVal& input,
> StringVal* val) {
>     if (!dst->is_null) return;
>     if (!src.is_null) {
>         uint8_t* copy = context->Allocate(src.len);
>         memcpy(copy, src.ptr, src.len);
>         *dst = StringVal(copy, src.len);
>         return;
>     }
> }
>
> void CollectSetMerge(FunctionContext* context, const StringVal& src,
> StringVal* dst) {
>     if (!dst->is_null) return;
>     if (!src.is_null) {
>         uint8_t* copy = context->Allocate(src.len);
>         memcpy(copy, src.ptr, src.len);
>         *dst = StringVal(copy, src.len);
>         return;
>     }
> }
> const StringVal CollectSetSerialize(FunctionContext* context, const
> StringVal& val) {
>     if (val.is_null) return val;
>     StringVal result(context, val.len);
>     memcpy(result.ptr, val.ptr, val.len);
>     context->Free(val.ptr);
>     return result;
> }
>
> StringVal CollectSetFinalize(FunctionContext* context, const StringVal&
> val) {
>     if (val.is_null) return val;
>     StringVal result(context, val.len);
>     memcpy(result.ptr, val.ptr, val.len);
>     context->Free(val.ptr);
>     return result;
> }
>
> I use Serizlize to free the allocated memory.Is there anything wrong?
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Impala User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to impala-user+unsubscribe@cloudera.org.
>

Re: got memory limit exceeded when use UDA

Posted by Skye Wanderman-Milne <sk...@cloudera.com>.
Ah sorry, just noticed you have "if (!dst->is_null) return;" at the
beginning of your update and merge functions, and didn't realize you were
only trying to return a single non-null value from the set. This does look
like it should only perform a single allocation.

Also, your update function calls 'dst' 'val', so I don't think it will
compile. Just to verify, are you definitely running the latest version of
your code?

Can you also copy the full "memory limit exceeded" message here?

On Wed, Mar 23, 2016 at 7:21 PM, Jerry <44...@qq.com> wrote:

> I'm afaid not.
> For example:
> TABLE test:
> ----
> id
> 1
> 1
> 2
> ----
> select collect_set(id)[0] from test.
> I will get the follow in hive.
> ---
> id
> 1
> ---
> But if use distinct function, I will get this:
> ---
> id
> 1
> 2
> ---
>
> I still wanna know why the memory cannot be free.
> Is the update or merge funtion only allocate memory only once?
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "Impala User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to impala-user+unsubscribe@cloudera.org.
>

Re: got memory limit exceeded when use UDA

Posted by Skye Wanderman-Milne <sk...@cloudera.com>.
Ah, I was unfamiliar with Hive's collect_set UDA, thanks. Unfortunately,
you won't be able to write this UDA in Impala, because it returns an array
(the set of all distinct columns), and Impala UDAs can only return scalar
types (i.e. no arrays, maps, or structs).

Can you use the DISTINCT keyword? See
http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_distinct.html.
If you share your query, we can possibly advise on how to use DISTINCT
instead of collect_set.

On Wed, Mar 23, 2016 at 7:00 PM, Jerry <44...@qq.com> wrote:

> Thanks for your reply.
> I use this UDA because I used collect_set(COLUMN)[0] in hive but impala
> doesn't hava this UDA.
> I hava read the example code of Avg function and Concat Function. However,
> I still cannot figure out why.
> void CollectSetInit(FunctionContext* context, StringVal* val) {
>     val->is_null = true;
> }
> void CollectSetMerge(FunctionContext* context, const StringVal& src,
> StringVal* dst) {
>     if (!dst->is_null) return;
>     if (!src.is_null) {
>         uint8_t* copy = context->Allocate(src.len);
>         memcpy(copy, src.ptr, src.len);
>         *dst = StringVal(copy, src.len);
>         return;
>     }
> }
> The update or merge function will return if the dst is not null.
>
> if (!dst->is_null) return;
>
> is that means dst's buffer only be allocated once rather than every time
> allocate a new buffer?
> Thanks for your patience to answer.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Impala User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to impala-user+unsubscribe@cloudera.org.
>