You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Abhinav Jha <ab...@gmail.com> on 2019/10/03 21:32:01 UTC

Impala crashes with UDA

Hi,

Impala Version: 3.1.0

I have a serialized data in Kudu which are stored as bytes in a column.
This data is pure ISO-8859-1 representation of HLL object which needs to be
reconstructed back to the HLL object and merge is performed within the UDA
function. Impala can read these objects as string and the UDA function
defined for merge is a string->string to get cardinality.

The unit tests seem to run without fail on a single node even for a large
number of HLLs passed. However, when running on a real cluster, it seems to
die almost all the time. I have been able to isolate the problem to the
Update function.


IMPALA_UDF_EXPORT
void HLLUpdate(FunctionContext* context, const StringVal& src, StringVal*
result){
        if (src.is_null || result->is_null) return;

        HLL* hll = reinterpret_cast<HLL*>(result->ptr);

        Builder *b = new Builder(14, 25);
        HLL temp = b->build(s, src.len);
        vector<char> srcBytes = BytesFromStringVal(src); // this is where
the deserialization happends back from utf-8 to iso-8859-1
        HLL temp = b->build(srcBytes);
        delete b;
        hll->addAll(temp);
}

There might be some memory corruption happening when the query is
distributed across the machines. But I haven't been able to figure the root
cause yet. Let me know if you need more information from my side.


-- 
Thanks
Abhinav Jha

Re: Impala crashes with UDA

Posted by Tim Armstrong <ta...@cloudera.com>.
Almost certainly a memory management bug.

I can't see how you're allocating the intermediate value in Init() and
freeing it later, but I'd assume that you're allocating enough memory for
the HLL object itself and freeing it in Serialize() and Finalize()
(otherwise you'll have a memory leak).

I'm really just speculating since I don't see the rest of your code, but
I'm gonna guess that there's something wrong with the Serialize() and
Finalize() methods. Those need to take all the data associated with the
intermediate value and pack it into the StringVal so it can be sent across
the network or spilled to disk. You almost certainly need to do this unless
HLL is actually just a single contiguous chunk of memory.

Actually I guess it's really hard to say much without understanding what
your intermediate value looks like.

I also don't know what the HLL object is. If it's a C++ object with a
vtable that's probably an issue since you're going to be sending the vtable
pointer over the network.


On Thu, Oct 3, 2019 at 3:17 PM Abhinav Jha <ab...@gmail.com> wrote:

> Hi,
>
> Impala Version: 3.1.0
>
> I have a serialized data in Kudu which are stored as bytes in a column.
> This data is pure ISO-8859-1 representation of HLL object which needs to be
> reconstructed back to the HLL object and merge is performed within the UDA
> function. Impala can read these objects as string and the UDA function
> defined for merge is a string->string to get cardinality.
>
> The unit tests seem to run without fail on a single node even for a large
> number of HLLs passed. However, when running on a real cluster, it seems to
> die almost all the time. I have been able to isolate the problem to the
> Update function.
>
>
> IMPALA_UDF_EXPORT
> void HLLUpdate(FunctionContext* context, const StringVal& src, StringVal*
> result){
>         if (src.is_null || result->is_null) return;
>
>         HLL* hll = reinterpret_cast<HLL*>(result->ptr);
>
>         Builder *b = new Builder(14, 25);
>         HLL temp = b->build(s, src.len);
>         vector<char> srcBytes = BytesFromStringVal(src); // this is where
> the deserialization happends back from utf-8 to iso-8859-1
>         HLL temp = b->build(srcBytes);
>         delete b;
>         hll->addAll(temp);
> }
>
> There might be some memory corruption happening when the query is
> distributed across the machines. But I haven't been able to figure the root
> cause yet. Let me know if you need more information from my side.
>
>
> --
> Thanks
> Abhinav Jha
>